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PROLOGUE—STARTING WITH LOGIC 


Law and computer science, in their classic form, employ logic to produce 
results. Edsger Dijkstra (1930-2002), one of the pioneers of computer 
science, expressed the essence of the field in its earlier times. Computer 
science had at its heart the mathematical analysis of algorithms, and thus... 


Programming is one of the most difficult branches of applied mathematics; 
the poorer mathematicians had better remain pure mathematicians.! 


This pithy summation of computer science—folding it into a branch of 
applied mathematics—is resilient. It remains at the root of the widespread 
view of computer science, still taught in undergraduate courses and 
echoed in explanations to the general public, that computers run algo- 
rithms, which are step-by-step instructions for performing a task? They 
might be complicated, too hard for “poorer mathematicians” to under- 
stand, but in the end they are formulaic. They are logical processes, read- 
ily designed, readily evaluated for success or failure, and readily fixed, so 
long as you have the analytic skills needed to understand their logic. 

In law, thinking long followed lines much like those Dijkstra described 
in computer science. Not least among the early modern exponents of clas- 
sic legal thinking, there was Sir William Blackstone (1723-1780), who, 
when writing On the Study of the Law, set out as belonging to the essen- 
tials that the student... 
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can reason with precision, and separate argument from fallacy, by the clear 
simple rules of pure unsophisticated logic ... can fix his attention, and 
steadily pursue truth through any the most intricate deduction, by the use 
of mathematical demonstrations ... [and] has contemplated those maxims 
reduced to a practical system in the laws of imperial Rome ...° 


The well-schooled lawyer, like the better mathematician, gets to the cor- 
rect result as surely as the well-designed algorithm generates a satisfactory 
computer output. As intricate as the deductions might be and thus de- 
manding on the intellect, the underlying process is “pure unsophisticated 
logic.” 

But to sum up computer science that way is out of date. The current 
boom in artificial intelligence, driven by machine learning, is not about 
deducing logical results from formulae. It is instead based on inductive 
prediction from datasets. The success of machine learning does not derive 
from better mathematics. It derives instead from bigger datasets and bet- 
ter understanding of the patterns those datasets contain. In the chapters 
that follow, we will explore this revolution in computer science. 

A revolution has taken place in modern times in law as well. As one 
would expect of a shift in thinking which has far-reaching impact, more 
than one thinker has been involved. Nevertheless, in law, one figure over 
the past century and a half stands out. Oliver Wendell Holmes Jr., the title 
of whose famous essay The Path of the Law we borrow in paraphrase for 
the title of our book, influenced the law and motivated changes in how 
people think about the law. To such an extent did Holmes affect legal 
thinking that his work marks a turning point. 

We believe that the revolution in computer science that machine learn- 
ing entails mirrors the revolution in law in which Oliver Wendell Holmes 
Jr. played so prominent a part. This book describes both revolutions and 
draws an analogy between them. Our purpose here is to expose the fun- 
damental contours of thought of the machine learning age—its concep- 
tual foundations—by showing how these trace a similar shape to modern 
legal thought; and by placing both in their wider intellectual setting. Get- 
ting past a purely technological presentation, we will suggest that machine 
learning, for all its novelty and impact, belongs to a long history of change 
in the methods people use to make sense of the world. Machine learning 
is a revolution in thinking. It has not happened, however, in isolation. 
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Machine learning deserves an account that relates it both to its immedi- 
ate antecedents in computer science and to another socially vital endeavor. 
Society at large deserves an account that explains what machine learning 
really is. 


HOLMES AND His LEGACY 


As the nineteenth century drew to a close in America, growth and change 
characterized practically every field of endeavor. Legal education partook 
of the upward trend, and the Boston University School of Law, then still 
a relative newcomer in the American city that led the country in academic 
endeavor, built a new hall. To mark the opening of the new hall, which 
was at 11 Ashburton Place, the dean and overseers of the School invited 
Holmes to speak. Then aged 55 and an Associate Justice of the Mas- 
sachusetts Supreme Judicial Court, Holmes was a local luminary, and he 
could be counted on to give a good speech. There is no evidence that the 
School was looking for more than that. The speech that Holmes gave on 
January 8, 1897, however, pronounced a revolution in legal thought. Its 
title was The Path of the Law.* Published afterward in the Harvard Law 
Review, this went on to become one of the most cited works of any ju- 
rist. The Path of the Law, not least of all Holmes’s statement therein that 
the law is the “prophecies of what the courts will do in fact,” exercises an 
enduring hold on legal imagination. Holmes rejected “the notion that a 
[legal system]... can be worked out like mathematics from some general 
axioms of conduct.”” He instead defined law as consisting of predictions 
or “prophecies” found in the patterns of experience. From its starting 
point as an operation of logical deduction, law according to Holmes, if 
law were to be understood fully, had to be understood as something else. 
It had to be understood as a process of induction with its grounding in 
modern ideas of probability. 

Holmes’s earlier postulate, that “[t]he life of the law has not been logic; 
it has been experience,” likewise has been well-remembered.? Holmes 
was not telling lawyers to make illogical submissions in court or to give 
their clients irrational advice. Instead, he meant to lead his audience to 
new ways of thinking about their discipline. Law, in Holmes’s view, starts 
to be sure from classic logic, but logic gets you only so far if you hope to 
understand the law. 

Holmes lived from 1841 to 1935, and so longevity perhaps contributed 
to his stature. There was also volume of output. Holmes authored over 
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800 judgments, gave frequent public addresses many of which are set 
down in print, and was a prolific correspondent with friends, colleagues, 
and the occasional stranger.!° There is also quotability.!! Holmes has de- 
tractors!? and champions.!* He has been the subject of “cycles of intel- 
lectual anachronisms, panegyrics, and condemnations.”!* It is not to our 
purpose to add to the panegyrics or to the condemnations. We do whole- 
heartedly embrace anachronism! Actually, we do not deny the limits of 
analogy across two disciplines across over a century of change; we will 
touch on some of the limits (Chapter 3). Yet, even so, Holmes’s con- 
ception of the law, in its great shift from formal deduction to inductive 
processes of pattern searching, prefigured the change from traditional al- 
gorithmic computing to the machine learning revolution of recent years. 
And, going further still, Holmes posited certain ideas about the process of 
legal decision making—in particular about the effect of past decisions and 
anticipated future decisions on making a decision in a case at hand—that 
suggest some of the most forward-thinking ideas about machine learning 
that computer scientists are just starting to explore (Chapter 9). There 
is also a line in Holmes’s thought that queried whether, notwithstanding 
the departure from formal proof, law might someday, through scientific 
advances that uncover new rules, find its way back to its starting point in 
logic. Here too an inquiry that Holmes led over a century ago in law may 
be applied today as we consider what the machine learning age holds in 
store (Chapter 10). 

As for the law in his day as Holmes saw it, and as many have since, it 
must be seen past its starting point in deductive reasoning if one is to make 
sense of it.!? Law is, according to Holmes, not logic, but experience— 
meaning that the full range of past decisions, rules, and social influences 
is what really matters in law. An “inductive turn” in Holmes’s thinking 
about law!°—and in law as practiced and studied more widely—followed. 
In computer science, the distinct new factor has been the emergence of 
data as the motive force behind machine learning. How much weight is 
to be attributed to logic, and how much to experience or data, is a point 
of difference among practitioners both in law and in computer science. 
The difference runs deep in the history and current practice of the fields, 
so much so that in law and in computer science alike it marks a divide in 
basic understandings. 

Jurists refer to formalists and realists when describing the divide in legal 
understanding that concerns us here. The formalists understand law as 
the application of logical rules to particular questions. The realists see it, 
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instead, as the discovery of patterns of behavior in a variety of legal and 
social sources. The formalists see their approach to law as the right place 
to start, and the strictest among them see the emergence of legal realism 
as a setback, not an advance, for law. The realists, for their part, sometimes 
dismiss the formalists as atavistic. The divide runs through the professional 
communities of advocates, advisers, and judges as much as through legal 
academia.!7 

The divide in computer science is neither as storied nor as sharply de- 
fined as that in law. It is not associated with any such widely accepted 
monikers as those attached to the logic-based formalists or the pattern- 
seeking realists in law. As we will explore further below, only in recent 
years has computing come to be a data-driven process of pattern finding. 
Yet the distinction between the deductive approach that is the basis of 
classic computer algorithms, and the inductive approach that is the basis 
of present-day advances in machine learning, is the central distinction in 
what may prove to be the central field of technological endeavor of the 
twenty-first century. The emergence of machine learning will be at best 
imperfectly understood if one does not recognize this conceptual shift 
that has taken place. 

What is involved here is no less than two great revolutions in theory 
and practice, underway in two seemingly disparate fields but consisting 
in much the same shift in basic conception. From conceiving of law and 
computer science purely as logical and algorithmic, people in both fields 
have shifted toward looking for patterns in experience or data. To arrive 
at outcomes in either field still requires logic but, in the machine learning 
age, just as in the realist conception of law that emerged with Holmes, 
the path has come to traverse very different terrain. 


A NOTE ON TERMINOLOGY: MACHINE LEARNING, 
ARTIFICIAL INTELLIGENCE, AND NEURAL NETWORKS 


In this book, we will refer to machine learning. Our goal in the following 
chapters is to explain what machine learning is—but before proceeding 
it may be helpful to say a few words to clarify the difference between 
machine learning, artificial intelligence, and neural networks.!® 

Artificial intelligence refers in academia to an evolving field which 
encompasses many areas from symbolic reasoning to neural networks. 
In popular culture, it encompasses everything from classic statistics re- 
branded by a marketing department (“Three times the Als as the next 
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leading brand!”) to science fiction, invoking pictures of robotic brains 
and conundrums about the nature of intelligence. 

Machine learning is a narrower term. The United Kingdom House 
of Lords Select Committee on Artificial Intelligence in its 2018 report!? 
highlights the difference: “The terms ‘machine learning? and ‘artificial in- 
telligence’ are ... sometimes conflated or confused, but machine learning 
is in fact a particular type of artificial intelligence which is especially dom- 
inant within the field today.” The report goes on to say, “We are aware 
that many computer scientists today prefer to use ‘machine learning’ given 
its greater precision and lesser tendency to evoke misleading public per- 
ceptions.” Broadly speaking, machine learning is the study of computer 
systems that use systematic mathematical procedures to find patterns in 
large datasets and that apply those patterns to make predictions about 
new situations. Many tools from classical statistics can be considered to 
be machine learning, though machine learning as an academic discipline 
can be said to date from the 1980s.7° 

Artificial neural network refers to a specific design of machine learn- 
ing system, loosely inspired by the connections of neurons in the brain. 
The first such network, the Perceptron, was proposed by Frank Rosen- 
blatt of the Cornell Aeronautical Laboratory in 1958.7! The original Per- 
ceptron had a simple pattern of connections between its neurons. Net- 
works with more complex patterns are called deep neural networks, and 
the mathematical procedure by which they learn is called deep learning. 
The current boom?7in artificial intelligence is based almost entirely on 
deep learning, and one can trace it to a single event: in 2012, in an an- 
nual competition called the ImageNet Challenge, in which the object is 
to build a computer program to classify images,?* a deep neural network 
called AlexNet?* beats all the other competitors by a significant margin. 
Since then, a whole host of tasks, from machine translation to playing Go, 
have been successfully tackled using neural networks. It is truly remarkable 
that these problems can be solved with machine learning, rather than re- 
quiring some grander human-like generalartificial intelligence. The reason 
it took from 1958 to 2012 to achieve this success is mostly attributable 
to computer hardware limitations: it takes a huge amount of processing 
on big datasets for deep learning to work, and it was only in 2012 that 
computer hardware’s exponential improvement met the needs of image 
classification.?° It also has helped that the means for gathering and stor- 
ing big datasets have improved significantly since the early days. 
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In this book, we will use the term machine learning, and we will 
not stray any further into artificial intelligence. We have neural networks 
in mind, but our discussion applies to machine learning more widely. 
Whether machine learning and neural networks have a role to play in 
the possible future emergence of a general AI—that is to say, a machine 
that D a or exceeds human intelligence—we will not even spec- 
ulate.?° 


NOTES 


1. Dijkstra, How Do We Tell Truths That Might Hurt? in Dijkstra, SELECTED 
WRITINGS ON COMPUTING: A PERSONAL PERSPECTIVE (1982) 129 (orig- 
inal text dated June 18, 1975). 

2. Countless iterations of this description appear in course materials on com- 
puter programming. See, e.g., http://computerscience.chemeketa.edu/ 
cs160Reader/Algorithms/AlgorithmsIntro.html; http://math.hws.edu/ 
javanotes/c3/s2.html. For a textbook example, see Schneider & Gersting, 
INVITATION TO COMPUTER SCIENCE (1995) 9. Schneider and Gersting, 
in their definition, stipulate that an algorithm is a “well-ordered collection 
of unambiguous and effectively computable operations that when executed 
produces a result and halts in a finite amount of time.” We will say some 
more in Chapter 9 about the “halting problem”: see p. 109. 

3. Sir William Blackstone, COMMENTARIES ON THE LAWS OF ENGLAND: 
Book THE FIRST (1765) 33. 

4. Holmes, The Path of the Law, 10 Harv. L. Rev. 457 (1896-97). 

5. Fred R. Shapiro in The Most-Cited Law Review Articles Revisited, 71 CH1.- 
KENT L. Rev. 751, 767 (1996) acknowledged that the fifth-place ranking 
of The Path of the Law reflected serious undercounting, because the only 
citations counted were those from 1956 onward. Only a small handful of 
Shapiro’s top 100 were published before 1956. Shapiro and his co-author 
Michelle Pearse acknowledged a similar limitation in a later update of the 
top citation list: The Most-Cited Law Review Articles of All Time, 110 
Micu. L. Rev. 1483, 1488 (2012). The Path came in third in Shapiro 
and Pearse’s 2012 ranking: id. at 1489. 

6. Illustrated, for example, by the several symposiums on the occasion of 
its centennial: See 110 Harv. L. Rev 989 (1997); 63 BROOK. L. REV. 
1 (1997); 78 B.U. L. Rev 691 (1998) (and articles that follow in each 
volume). Cf. Alschuler, 49 Fra. L. Rev. 353 (1997) (and several responses 
that follow in that volume). 

7. Holmes, The Path of the Law, 10 Harv. L. Rev. 457, 465 (1896-97). 
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. A search discloses over three hundred instances of American judges quot- 


ing the phrase in judgments, some three dozen of these being judgments 
of the U.S. Court of Appeals, some half dozen of the U.S. Supreme Court. 
Thus, the 1995 collection edited by Novick of mostly non-judicial writings 
(but not personal correspondence) runs to five volumes. 
It would fill over a page to give citations to American court judgments, 
state and federal, referring to Holmes as “pithy” (from Premier-Pabst 
Sales Co. v. State Bd. of Equalization, 13 F.Supp. 90, 95 (District Court, 
S.D. California, Central Div.) (Yankwich, DJ, 1935) to United States v. 
Thompson, 141 F.Supp.3d 188, 199 (Glasser, SDJ, 2015)) or “memo- 
rable” (from Regan & Company, Inc. v. United States, 290 F.Supp.470, 
(District Court, E.D. New York) (Rosling, DJ, 1968) to Great Hill Equity 
Partners IV, et al. v. SIG Growth Equity Fund I, et al., (unreported, Court 
of Chancery, Delaware) (Glasscock, VC, 2018)). On aesthetics and style 
in Holmes’s writing, see Mendenhall, Dissent as a Site of Aesthetic Adap- 
tation in the Work of Oliver Wendell Holmes Jr., 1 BRIT. J. AM. LEGAL 
Stub. 517 (2012) esp. id. at 540-41. 
For example Ronald Dworkin & Lon L. Fuller. See Ronald Dworkin, 
Law’s EMPIRE (1986) 13-14; Lon Fuller, Positivism and Fidelity to 
Law—A Reply to Professor Hart, 71 Harv. L. REV. 630 esp. id. at 657-58 
(1958). 
Richard A. Posner is perhaps the most prominent of the champions in 
the late twentieth and early twenty-first centuries. See Posner’s Intro- 
duction in THE ESSENTIAL HOLMES. SELECTIONS FROM THE LETTERS, 
SPEECHES, JUDICIAL OPINIONS, AND OTHER WRITINGS OF OLIVER 
WENDELL HOLMES, JR. (1992). Cf. H.L.A. Hart, Positivism and the Sep- 
aration of Law and Morals, 71 Harv. L. Rev. 593 (1958) (originally the 
Oliver Wendell Holmes Lecture, Harvard Law School, April 1957). Fur- 
ther to a curious link that Hart seems to have supplied between Holmes 
and computer science, see Chapter 10, p. 123. 
Pohlman (1984) 1. Cf. Gordon (1992) 5: Holmes has “inspired, and... 
continues to inspire, both lawyers and intellectuals to passionate attempts 
to come to terms with that legend—to appropriate it to their own pur- 
poses, to denounce and resist it, or simply to take it apart to see what it 
is made of.” 
Holmes, THE COMMON Law (1881) 1. 
The apt phrase “inductive turn” is the one used in the best treatment of 
Holmes’s logic: Frederic R. Kellogg, OLIVER WENDELL HOLMEs JR. AND 
LEGAL Loeic (2018) pp. 35, 72-87, about which see further Chapter 1, 
2. 
For a flavor of the critique of formalism, see Frederick Schauer’s treatment 
of the Supreme Court’s judgment in Lochner v. New Yorkand its reception: 
Frederick Schauer, Formalism, 97 YALE L. J. 509, 511-14 (1988); and 
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for frontal defenses (albeit from very different quarters), Antonin Scalia, 
The Rule of Law as a Law of Rules, 56 U. Cut. L. Rev. 1175 (1989) and 
James Crawford, Chance, Order, Change: The Course of International Law, 
in Hague Academy of Int’! Law, 365 RECUEIL DES Cours 113, 113-35 
(2013). Further to formalism, see Chapter 1, p. 2; Chapter 2, pp. 20-21. 
The 2018 Report of the UN Secretary-General on Current developments in 
science and technology and their potential impact on international security 
and disarmament efforts put the relation between the terms like this: 


Modern artificial intelligence comprises a set of sub-disciplines and 
methods that leverage technology, such as data analysis, visual, 
speech and text recognition, and robotics. Machine learning is one 
such sub-discipline. Whereas hand-coded software programmes typ- 
ically contain specific instructions on how to complete a task, ma- 
chine learning allows a computer system to recognize patterns in 
large data sets and make predictions. Deep learning a subset of ma- 
chine learning, implements various machine-learning techniques in 
layers based on neural networks, a computational paradigm loosely 
inspired by biological neurons. Machine-learning techniques are 
highly dependent on the quality of their input data, and arguably 
the quality of the data is more important to the success of a system 
than is the quality of the algorithm. A/73/177 (July 17, 2018). Cf. 
Chapter 1, p. 14, n. 12. 


The proper distinction between the three terms has led to heated ar- 
gument between technologists. See e.g., https://news.ycombinator.com/ 
item?id=20706174 (accessed Aug. 24, 2019). 

Select Committee on Artificial Intelligence (Lords), Report (Apr. 16, 
2018) p. 15, 17. 

See, e.g., Efron & Hastie (2016) 351. See also Leo Breiman as quoted in 
Chapter 1, p. 1. 

Rosenblatt (1958). 

Artificial intelligence has experienced a series of booms and “AI winters.” 
For a broader history of artificial intelligence, see Russell & Norvig (2016) 
5-27. Cf. National Science and Technology Council (U.S.), The National 
Artificial Intelligence Research Development Strategic Plan (Oct. 2016) 
pp. 12-14, describing three “waves” of AI development since the 1980s. 
The ImageNet database was announced in 2009: J. Deng, W. Dong, R. 
Socher, L.-J. Li, K. Li & L. Fei-Fei, ImageNet: A Large-Scale Hierarchical 
Image Database, CVPR, 2009. The first ImageNet Challenge was in 2010. 
For the history of the Challenge, see Olga Russakovsky*, Jia Deng*, Hao 
Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej 
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Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei- 
Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition 
Challenge. IJCV, 2015. 

Krizhevksy, Sutskever & Hinton (2017) 60(6) Comms. AcM 84-90. 

The computational power comes from better hardware in the form of 
graphics processing units (GPUs). The computer gaming industry spurred 
the development of hardware for better graphics, and this hardware was 
then used to speed up the training of neural networks. 

Cognitive neuroscientists have observed a correlation between the de- 
velopment of eyes and brain size: See, e.g., Gross, Binocularity and Brain 
Evolution in Primates, (2004) 101(27) pnas 10113-15. See also Pass- 
ingham & Wise, THE NEUROBIOLOGY OF THE PREFRONTAL CORTEX: 
ANATOMY, EVOLUTION AND THE ORIGIN OF INSIGHT (2012). Thus in 
some, albeit very general sense, a link is suggested both in biological evo- 
lution and in the development of computer science between the increase 
in processing power (if one permits such an expression in regard to brains 
as well as GPUs) and the demands of dealing with imagery. 

For speculation about an impending “singularity”—a future moment when 
AI emerges with capacities exceeding human cognition—see Bostrom, 
SUPERINTELLIGENCE: PATHS, DANGERS, STRATEGIES (2014). 
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CHAPTER 1 


Two Revolutions 


What constitutes the law? You will find some text writers telling you that it is 
something different from what is decided by the courts of Massachusetts or England, 
that it is a system of reason, that it is a deduction from principles of ethics or 
admitted axioms or what not, which may or may not coincide with the decisions. 
But if we take the view of our friend the bad man we shall find that he does not 
care two straws for the axioms or deductions, but that he does want to know what 
the Massachusetts or English courts are likely to do in fact. I am much of his mind. 
The prophecies of what the courts will do in fact, and nothing more pretentious, are 
what I mean by the law. 


Oliver Wendell Holmes, Jr. The Path of the Law (1897) 


In the mid-1980s two powerful new algorithms for fitting data became available: 
neural nets and decision trees. A new research community using these tools sprang 
up. Their goal was predictive accuracy. The community consisted of young computer 
scientists, physicists and engineers plus a few aging statisticians. They began using 
the new tools in working on complex prediction problems where it was obvious that 
data models were not applicable: speech recognition, image recognition, nonlinear 
time series prediction, handwriting recognition, prediction in financial markets. 


Leo Breiman, Statistical Modeling: The Two Cultures (2001) 


Machine learning, the method behind the current revolution in artificial 
intelligence,! may serve a vast range of purposes. People across practically 
every walk of life will feel its impact in the years to come. Not many of 
them, however, have any very clear idea how machine learning works. 
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The present short book describes how machine learning works. It does 
so with a surprising analogy. 

Oliver Wendell Holmes, Jr., one of the law’s most influential figures 
in modern times, by turns has been embraced for the aphoristic quality 
of his writing and indicted on the charge that he reconciled himself too 
readily to the injustices of his day. It would be a mistake, however, to take 
Holmes to have been no more than a crafter of beaux mots, or to look 
no further than the judgment of some that he lacked moral compass. 
That would elide Holmes’s role in a revolution in legal thought—and the 
remarkable salience of his ideas for a revolution in computer science now 
under way. 

Holmes in the years immediately after the American Civil War engaged 
with leading thinkers of the nineteenth century, intellectuals who were 
taking a fresh look at scientific reasoning and logic and whose insights 
would influence a range of disciplines in the century to come. The 
engagement left an imprint on Holmes and, through his work as scholar 
and as judge, would go on to shape a new outlook on law. Holmes played 
a central role in what has recently been referred to as an “inductive turn” 
in law,” premised on an understanding that law in practice is not a sys- 
tem of syllogism or formal proof but, instead, a process of discerning pat- 
terns in experience. Under his influence, legal theory underwent a change 
from deduction to induction, from formalism to realism. This change has 
affected the law in theory and in practice. It is oft-recounted in modern 
legal writing. A formalist view of legal texts—seeing the law as a formula 
that can be applied to the factual situations the legislator promulgated the 
law to address—remains indispensable to understanding law; but formal- 
ism, for better or worse, no longer suffices if one is to understand how 
lawyers, judges, and others involved in the law actually operate. 

In computer science, a change has occurred which today is having 
at least as much impact, but in most quarters remains unknown or, at 
best, imprecisely grasped. The new approach is an inductive data-driven 
approach, in which computers are “trained” to make predictions. It had 
its roots in the 1950s and has come to preeminence since 2012. The clas- 
sic view of computing, by contrast, is that computers execute a series of 
logical steps that, applied to a given situation, lead to completion of a 
required task; it is the programmer’s job to compose the steps as an algo- 
rithm to perform the required task. Machine learning is still built on com- 
puters that execute code as a series of logical steps, in the way they have 
since the start of modern computing*®—but this is not an adequate expla- 
nation of what makes machine learning such a powerful tool, so powerful 


1 TWO REVOLUTIONS 3 


that people talk of it as the point of departure toward a genuine artificial 
intelligence. 


l.l AN ANALOGY AND WuHy WP’RE MAKING IT 


In this book, we describe, in the broadest sense, how machine learn- 
ing does what it does. We argue that the new and unfamiliar terrain of 
machine learning mirrors with remarkable proximity Holmes’s concep- 
tion of the law. Just as the law is a system of “prophesy from experience,” 
as Holmes put it, so too machine learning is an inductive process of pre- 
diction based on data. We consider the two side by side in the chapters 
that follow for two mutually supporting purposes: in order to convey a 
better understanding of machine learning; and in order to show that the 
concepts behind machine learning are not a sudden arrival but, instead, 
belong to an intellectual tradition whose antecedents stretch back across 
disciplines and generations. 

We will describe how machine learning differs from traditional algo- 
rithmic programming—and how the difference between the two is strik- 
ingly similar to the difference between the inductive, experience-based 
approach to law so memorably articulated by Holmes and the formalist, 
text-based approach that that jurist contrasted against his own. Law and 
computing thus inform one another in the frame of two revolutions in 
thought and method. We’ll suggest why the likeness between these two 
revolutions is not happenstance. The changes we are addressing have a 
shared origin in the modern emergence of ideas about probability and 
statistics. 

Those ideas should concern people today because they have practical 
impact. Lawyers have been concerned with the impact of the revolution in 
their own field since Holmes’s time. It is not clear whether technologists’ 
concern has caught up with the changes machine learning has brought 
about. Technologists should concern themselves with machine learning 
not just as a technical project but also as a revolution in how we try 
and make sense of the world, because, if they don’t, then the people 
best situated to understand the technology won’t be thinking as much as 
they might about its wider implications. Meanwhile, social, economic, and 
political actors need to be thinking more roundly about machine learning 
as well. These are the people who call upon our institutions and rules to 
adapt to machine learning; some of the adaptations proposed to date are 
not particularly well-conceived.* 
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New technologies of course have challenged society before the 
machine learning age. Holmes himself was curious and enthusiastic about 
the technological change which over a hundred years ago was already 
unsettling so many of the expectations which long had lent stability to 
human relations. He did not seem worried about the downsides of the 
innovations that roused his interest. We turn to Holmes the futurist in 
the concluding part of this book by way of postscript; scientism—an 
unexamined belief that science and technology can solve any problem—is 
not new to the present era of tech-utopians. 

Our principal concern, however, is to foster a better understanding 
of machine learning and to locate this revolution in its wider setting 
through an analogy with an antecedent revolution in law. We address 
machine learning because those who make decisions about this technol- 
ogy, whether they are concerned with its philosophical implications, its 
practical potential, or safeguards to mitigate its risks, need to know what 
they are making decisions about. We address it the way we do because 
knowledge of a thing grows when one sees how it connects to other 
things in the world around it. 


1.2 WHAT THE ANALOGY BETWEEN 
A NINETEENTH CENTURY JURIST 
AND MACHINE LEARNING CAN TELL Us 


The claim with which we start, which we base on our understanding of 
the two fields that the rest of this book will consider, is this: Holmes’s 
conception of the law, which has influenced legal thought for close to a 
century and a half, bears a similar conceptual shape and structure to that 
which computing has acquired with the recent advances in machine learn- 
ing. One purpose in making this claim is to posit an analogy between a 
change in how people think about law, and a change that people need 
to embrace in their thinking about how computers work—if they are to 
understand how computers work in the present machine learning age. 
The parallels between these two areas as they underwent profound trans- 
formation provide the organizing idea of this book. 

Despite the myriad uses for machine learning and considerable atten- 
tion it receives, few people outside immediate specialty branches of com- 
puter science and statistics avoid basic misconceptions about what it is. 
Even within the specialties, few experts have perspective on the concep- 
tual re-direction computer science in recent years has taken, much less an 
awareness of its kinship to revolutionary changes that have shaped another 
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socially vital field. The analogy that we develop here between law and 
machine learning supplies a new way of looking at the latter. In so doing, 
it helps explain what machine learning is. It also helps explain where 
machine learning comes from: the recent advances in machine learning 
have roots that reach deeply across modern thought. Identifying those 
roots is the first step toward an intellectual history of machine learning. It 
is also vital to understanding why machine learning is having such impact 
and why it is likely to have still more in the years ahead. 

The impact of machine learning, realized and anticipated, identifies it 
as a phenomenon that requires a social response. The response is by no 
means limited to law and legal institutions, but arriving at a legal classi- 
fication of the phenomenon is overdue. Lawyers and judges already are 
called upon to address machine learning with rules.° And, yet, legisla- 
tive and regulatory authorities are at a loss for satisfactory definition. We 
believe that an analogy between machine learning and law will help. 

But what does an analogy tell us, that a direct explanation does not? 

One way to gain understanding of what machine learning 7s is by enu- 
merating what it does. Here, for example, is a list of application areas 
supplied by a website aimed at people considering careers in data science: 


e Game-playing 

e Transportation (automated vehicles) 

e Augmenting human physical and mental capabilities (“cyborg” tech- 
nology) 

e Controlling robots so they can perform dangerous jobs 

e Protecting the environment 

Emulating human emotions for the purpose of providing convincing 

robot companions 

Improving care for the elderly 

General health care applications 

Banking and financial services 

Personalized digital media 

Security 

Logistics and distribution (supply chain management) 

Digital personal assistants 

E-commerce 

Customizing news and market reports. 
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Policy makers and politicians grappling with how to regulate and promote 
AI make lists like this too. The UK House of Lords, for example, having 
set up a Select Committee on Artificial Intelligence in 2017, published 
a report of the Committee which, inter alia, listed a number of specific 
fields which are using AI. An Executive Order of the President of the 
United States, adopted in 2019, highlighted the application of AI across 
diverse aspects of the national economy.’ The People’s Republic of China 
Ministry of Industry and Information Technology adopted an Action Plan 
in 2017 for AI which identified a range of specific domains in which AI’s 
applications are expected to grow.!° But while lists of applications can 
reflect where the technology is used today, they don’t indicate where it 
might or might not be used in the future. Nor do such lists convey the 
clearer understanding of how AI works that we need if we are to address 
it, whether our purpose is to locate AI in the wider course of human 
development to which it belongs or to adjust our institutions and laws 
so that they are prepared for its impact, purposes which, we suggest, are 
intertwined. AI is a tool, and naming things the tool does is at best only 
a roundabout route to defining it. 

Suggesting the limits in that approach, others attempting to define arti- 
ficial intelligence have not resorted to enumeration. To give a high profile 
example, the European Commission, in its Communication in 2018 on 
Artificial Intelligence for Europe, defined AI as “systems that display intel- 
ligent behavior by analyzing their environment and taking actions—with 
some degree of autonomy—to achieve specific goals.”!! This definition 
refers to AI as technology “to achieve specific goals”; it does not list what 
those goals might be. It is thus a definition that places weight not on 
applications (offering none of these) but instead on general character- 
istics of what it defines. However, defining AI as “systems that display 
intelligent behaviour” is not adequate either; it is circular. Attempts to 
define machine learning and artificial intelligence tend to rely on syn- 
onyms that add little to a layperson’s understanding of the computing 
process involved.!? In a machine learning age, more is needed if one is 
both to grasp the technical concept and to intuit its form. 

In this book, we do not continue the search for synonyms or compile 
an index of extant definitions. Nor do we undertake to study how AI 
might be applied to particular practical problems in law or other disci- 
plines. Instead, we aim to develop and explore an analogy that will help 
people understand machine learning. 
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The value of analogy as a means to understand this topic is suggested 
when one considers how definitions of unfamiliar concepts work. Carl 
Hempel, one of the leading thinkers on the philosophy of science in the 
twentieth century, is known by American lawyers for the definition of 
“science” that the U.S. Supreme Court espoused in Daubert, a landmark 
in twentieth century American jurisprudence.!? Hempel was concerned 
as well with the definition of “definition.” He argued that definition 
“requires the establishment of diverse connections... between different aspects 
of the empirical world.” \* It is from the idea of diverse connections that 
we take inspiration. We posit an analogy between two seemingly unre- 
lated fields and with that analogy elucidate the salient characteristics of 
an emerging technology that is likely to have significant effects on many 
fields in the years to come.!> We aim with this short book to add to, and 
diversify, the connections among lawyers, computer scientists, and others 
as well, who should be thinking about ow to think about the machine 
learning age which has now begun. 

We will touch on the consequences of the change in shape of both law 
and computing, but our main concern lies elsewhere—namely, to supply 
the reader with an understanding of how precisely under a shared intellec- 
tual influence those fields changed shape and, moreover, with an under- 
standing of what machine learning—the newer and less familiar field—is. 


1.3 APPLICATIONS OF MACHINE 
LEARNING IN LAW—AND EVERYWHERE ELSE 


Writers in the early years of so-called artificial intelligence, before machine 
learning began to realize its greater potential, were interested in how 
computers might affect legal practice.! Many of them were attempt- 
ing to find ways to use AI to perform particular law-related tasks. Some 
noted the formalist-realist divide that had entered modern legal think- 
ing.!7 Scholars and practitioners who considered law and AI were inter- 
ested in the contours of the former because they wished to see how one 
might get a grip on it using the latter, like a farmer contemplating a stone 
that she needs to move and reckoning its irregularities, weight, position, 
etc. before hooking it up to straps and pulleys. Thus, to the extent they 
were interested in the nature of law, it was because they were interested 
in law as a possible object to which to apply AI, not as a source of insight 
into the emergence of machine learning as a distinct way that computers 
might be used. 
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Investigation into practical applications of AI, including in law, has 
been reinvigorated by advances that machine learning has undergone in 
recent years. The advances here have taken place largely since 2012.18 In 
the past several years, it seems scarcely a day goes by without somebody 
suggesting that artificial intelligence might supplement, or even replace, 
people in functions that lawyers, juries, and judges have performed for 
centuries.!? In regard to functions which the new technology already 
widely performs, it is asked what “big data” and artificial intelligence 
imply for privacy, discrimination, due process, and other areas of con- 
cern to the law. An expanding literature addresses the tasks for which 
legal institutions and the people who constitute them use AI or might 
come to in the future, as well as strategies that software engineers use, or 
might in the future, to bring AI to bear on such tasks.?° In other words, 
a lot is being written today about AI and law as such. The application of 
AI in law, to be sure, has provoked intellectual ferment, certain practical 
changes, and speculation as to what further changes might come. 

But the need for a well-informed perspective on machine learning is 
not restricted to law. We do not propose here to address, much less to 
solve, the technical challenges of putting AI in harness to particular prob- 
lems, law-related or other. It is not our aim here to compile another list of 
examples of tasks that AI performs, any more than it is our purpose to list 
examples of the subject matter that laws regulate. Tech blogs and policy 
documents, like the ones we just referred to above, are full of suggestions 
as to the former; statute books and administrative codes contain the lat- 
ter. Nor is it our purpose here to come up with programming strategies 
for the application of AI to tasks in particular fields; tech entrepreneurs 
and software engineers are doing that in law and many fields besides. 

There are law-related problems—and others—that people seek to 
employ machine learning to solve, but cataloguing the problems does 
not in itself impart much understanding of what machine learning zs. The 
concepts that we deal with here concern how the mechanisms work, not 
(or not primarily) what they might do (or what problems they might 
be involved in) when they work. Getting at these concepts is necessary, if 
the people who ought to understand AI today, lawyers included, are actu- 
ally to understand it. Reaching an understanding of how its mechanisms 
work will locate this new technology in wider currents of thought. It is 
on much the same wider currents that the change in thinking about law 
that we address took place. This brings us to the common ancestor of the 
two revolutions. 
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1.4 Two REVOLUTIONS WITH A COMMON ANCESTOR 


Connections between two things in sequence do not necessarily mean 
that the later thing was caused by the one that came before, and a jurist 
who died in 1935 certainly was not the impetus behind recent advances 
in computer science. We aren’t positing a connection between Holmes’s 
jurisprudence and machine learning in that sense, nor is it our aim to offer 
an historical account of either law or computer science writ large. Our 
goal in this book is to explain how machine learning works by making an 
analogy to law—following Hempel’s suggestion that connections across 
different domains can help people understand unfamiliar concepts. 

Nonetheless, it is interesting to note an historical link between law 
and the mathematical sciences: the development of probabilistic thinking. 
According to philosopher of science Ian Hacking, 


[A]round 1660 a lot of people independently hit on the basic probabil- 
ity ideas. It took some time to draw these events together but they all 
happened concurrently. We can find a few unsuccessful anticipations in the 
sixteenth century, but only with hindsight can we recognize them at all. 
They are as nothing compared to the blossoming around 1660. The time, 
it appears, was ripe for probability.” 1 


It’s perhaps surprising to learn about the link between probability the- 
ory and law. In fact, the originators of mathematical probability were all 
either professional lawyers (Fermat, Huygens, de Witt) or the sons of 
lawyers (Cardano and Pascal).?” At about the time Pascal formulated his 
famous wager about belief in God,?* Leibniz thought of applying numer- 
ical probabilities to legal problems; he later called his probability theory 
“natural jurisprudence.”** Leibniz was a law student at the time, though 
he is now better known for his co-invention of the differential calculus 
than for his law.?° Leibniz developed his natural jurisprudence in order 
to reason mathematically about the weight of evidence in legal argument, 
thereby systematizing ideas that began with the Glossators of Roman Law 
in the twelfth century.*° Law and probability theory both deal with evi- 
dence; the academic field of statistics is the science of reasoning about 
evidence using probability theory. Statistical theory for calculating the 
weight of evidence is now well understood.’” Leibniz, if he were alive 
today, might find it interesting that judges are sometimes skeptical about 
statistics; but even where (as in a murder case considered by the Court 
of Appeal of England and Wales in 2010) courts have excluded statistical 
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theory for some purposes, they have remained open to it for others.?® 
Whether or not a given court in a given case admits statistical theory 
into its deliberations, the historical link between lawyers and probability 
remains. 

As we said, though, our concern here is not with history as such. 
Probabilistic thinking is not only an historical link between law and the 
mathematical sciences. It is also the motive force behind the two modern 
revolutions that we are addressing. Machine learning (like any success- 
ful field) has many parents, but it’s clear from any number of textbooks 
that probability theory is among the most important. As for Holmes, his 
particular interests in his formative years were statistics, logic, and the dis- 
tinction between deductive and inductive methods of proof in science; he 
later wrote that “the man of the future is the man of statistics.”?? That 
Holmes’s milieu was one of science and wide-ranging intellectual inter- 
ests is a well-known fact of biography; his father was an eminent medical 
doctor and researcher, and the family belonged to a lively community of 
thinkers in Boston and Cambridge, the academic and scientific center of 
America at the time. 

Less appreciated until recently is how wide and deep Holmes’s engage- 
ment with that community and its ideas had been. Frederic R. Kellogg, 
in a magisterial study published in 2018 entitled Oliver Wendell Holmes 
Jr. and Legal Logic, has brought to light in intricate detail the ground- 
ings Holmes acquired in science and logic before his rise to fame as 
a lawyer. Holmes’s interlocutors included the likes of his friends Ralph 
Waldo Emerson, Chauncey Wright, and the James brothers, William and 
Henry.°° Holmes’s attendance of the Lowell Lectures on logic and scien- 
tific induction delivered by Charles Peirce in 1866 exercised a particular 
and lasting influence on Holmes’s thought.*! Holmes spent a great deal 
of time as well with the writings of John Stuart Mill, including Mill’s A 
System of Logic, Ratiocinative and Inductive. (He met Mill in London 
in 1866; they dined together with engineer and inventor of the electric 
clock Alexander Bain.*”) Diaries and letters from the time record Holmes 
absorbed in conversation with these and other thinkers and innovators. 
Holmes eventually conceded his “Debauch on Philosophy” would have 
to subside if he ever were to become a practicing lawyer.*? 

Holmes clearly was interested in statistics, and statistics can be used 
to evaluate evidence in court. But Holmes’s famous saying, to which we 
will return below, that law is nothing more than “prophecies of what the 
courts will do,” points to a different use of probability theory: it points to 
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prediction. Traditional statistical thinking is mostly concerned with mak- 
ing inferences about the truth of scientific laws and models, at least in so 
far as scientific models can be said to be “true.” For example, an expert 
might propose an equation for the probability that a prisoner will reof- 
fend, or that a defendant is guilty of a murder, and the statistician can 
estimate the terms in the equation and quantify their confidence. A dif- 
ferent type of thinking was described by Leo Breiman in a rallying call 
for the nascent discipline of machine learning: he argued that prediction 
about individual cases is a more useful goal than inference about gen- 
eral rules, and that models should be evaluated purely on the accuracy 
of their predictions rather than on other scientific considerations such as 
parsimony or interpretability or consonance with theory.** For example, a 
machine learning programmer might build a device that predicts whether 
or not a prisoner will reoffend. Such a device can be evaluated on the 
accuracy of its predictions. True, society at large might insist that scrutiny 
be placed on the device to see whether its predictions come from sound 
considerations, whether using it comports with society’s values, etc. But, 
in Breiman’s terms, the programmer who built it should leave all that 
aside: the predictive accuracy of the device, in those terms, is the sole 
measure of its success. We will discuss the central role of prediction both 
in Holmes’s thought and in modern machine learning in Chapters 5 and 
6. 

Time and again, revolutions in thought and method have coincided. 
Thomas Kuhn, among other examples in his The Structure of Scientific 
Revolutions, noted that a shift in thinking about what electricity is led 
scientists to change their experimental approach to exploring that natural 
phenomenon.*° Later, and in a rather different setting, Peter Bernstein 
noted that changes in thinking about risk were involved in the emer- 
gence of the modern insurance industry.*° David Landes considered the 
means by which societies measured time, how its measurement affected 
how societies thought about time, and how they thought about time in 
turn affected their behaviors and institutions.*” The relations that inter- 
ested these and other thinkers have been in diverse fields and have been 
of different kinds and degrees of proximity. A shift in scientific theory 
well may have direct impact on the program of scientific investigation; 
the transmission of an idea from theory to the marketplace might be less 
direct; the cultural and civilizational effects of new conceptions of the 
universe (e.g., conceptions of time) still less.°° 
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Again, it is not our aim in this book to offer an historical account, 
nor a tour d’horizon of issues in philosophy of science or philosophy of 
law. Nor is it our aim to identify virtues or faults in the transformations 
we address. In computer science, it would be beside the point to “take 
sides” as between traditional algorithmic approaches to programming and 
machine learning. The change in technology is a matter of fact, not to be 
praised or criticized before its lineaments are accurately perceived. Nor, 
in law, is it to the present point to say whether it is good or bad that 
many jurists, especially since Holmes’s time, have not kept faith with the 
formalist way of thinking about law. Battles continue to be fought over 
that revolution. We don’t join those battles here. 

What we do, instead, is propose that Holmes, in particular in his under- 
standing of law as prediction formed from the search for patterns in expe- 
rience, furnishes remarkably powerful analogies for machine learning. Our 
goal with the analogies is to explain the essence of how machine learning 
works. We believe that thinking about law in this way can help people 
understand machine learning as it is now—and help them think about 
where machine learning might go from here. People need both to grasp 
the state of the art and to think about its future, because machine learn- 
ing gives rise to legal and ethical challenges that are difficult to recognize, 
even more to address, unless they do. Reading Holmes with machine 
learning in mind, we discern lessons about the challenges. Machine learn- 
ing is a revolution in thinking, and it deserves to be understood much 
more widely and placed in a wider setting. 


NOTES 


1. As to the difference between “artificial intelligence” and “machine learn- 
ing,” see Prologue, pp. ix-x. 

2. Kellogg (2018) 35, 72-87. 

3. According to Brian Randell, a computer scientist writing in the 1970s, 
electronic digital computers had their origin in the late 1940s, but “[i]n 
most cases their developers were unaware that nearly all the impor- 
tant functional characteristics of these computers had been invented over 
a hundred years earlier by Charles Babbage,” an English mathemati- 
cian who had been “interested in the possibility of mechanising the 
computation and printing of mathematical tables.” Randell, The His- 
tory of Digital Computers, 12(11-12) IMA BuLL. 335 (1976). Babbage 
(1791-1871) designed his “difference engine” around 1821, further to 
which see Swade, DIFFERENCE ENGINE: CHARLES BABBAGE AND THE 


v 


10. 


ll. 


12. 
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QUEST TO BUILD THE FIRST COMPUTER (2001). See also the timeline 
supplied by the Computer History Museum (Mountain View, Califor- 
nia, USA): https://www.computerhistory.org/timeline/computers/. For 
a superb overview that locates these mechanical developments in the 
history of ideas, see Historicizing the Self-Evident: An Interview with 
Lorraine Daston (Jan. 25, 2000): https: //lareviewofbooks.org/article/ 
historicizing-the-self-evident-an-interview-with-lorraine-daston/. 


. Take for example proposals that the law confer legal personality on 


“robots”: European Parliament, Civil Law Rules on Robotics resolution 
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CHAPTER 2 


Getting Past Logic 


In law, as we saw in Chapter 1, the contrast between formalism and induc- 
tivism was evident practically from the moment jurists began to consider 
that logic might not explain everything about the law. The contrast con- 
tinues to define lines that run through legal studies and judicial politics, 
especially in the United States and also to an extent in other common 
law jurisdictions. The lines are readily discernible. Volumes of literature 
and on-going disputes are gathered on one side or the other. People 
who think about and practice law have identified themselves in adversarial 
terms by reference to which side of those lines they stand on. In com- 
puting, as we also noted in Chapter 1, the lines are not drawn in quite 
such clear relief. They are certainly not the reference point for the intel- 
lectual identity of opposing camps of computer scientists. The conceptual 
shift that underpins the emergence of machine learning has had enormous 
impact, but it has not been an object of sustained discourse, much less of 
pitched ideological battle. Computer scientists thus, perhaps, enjoy a cer- 
tain felicity in their professional relations, but they also are probably less 
alert to the distinction that machine learning introduces between their 
present endeavors and what they were doing before. 

We will examine in the three chapters immediately after this the ingre- 
dients that go into making machine learning so different from traditional 
algorithm-based programming. The ingredients are data (Chapter 3), pat- 
tern finding (Chapter 4), and prediction (Chapter 5). Before examining 
these, we wish to consider further the contrast between machine learning 
and what came before. The rise of legal realism was explicitly a challenge 
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to what came before in law, and, so, the contrast was patent. With the 
emergence of induction-driven machine learning, the contrast ought to 
be no less clear, but people continue to miss it. Getting past logic is nec- 
essary, if one is to get at what’s new about machine learning—and at why 
this kind of computing presents special challenges when it comes to values 
in society at large. 


2.1 FORMALISM IN LAW 
AND ALGORITHMS IN COMPUTING 


Legal formalists start with the observation, to which few would object, 
that law involves rules. They identify the task in law, whether performed 
by a lawyer or a judge, to be that of applying the rules to facts, again 
in itself an unobjectionable, or at least unremarkable, proposition. Where 
formalism is distinctive is in its claim that these considerations supply a 
complete understanding of the law. “Legal reasoning,” said the late twen- 
tieth century critical legal scholar Roberto Unger, “is formalistic when 
the mere invocation of rules and deduction of conclusions from them 
is believed sufficient for every authoritative legal choice.”! An impor- 
tant correlate follows from such a conception of the law.? The formal- 
ists say that, if the task of legal reasoning is performed correctly, mean- 
ing in accordance with the logic of the applicable rules, the lawyer or 
judge reaches the correct result. The result might consist in a judgment 
(adopted by a judge and binding on parties in a dispute) or in a briefing 
(to her client by a lawyer), but whatever the forum or purpose, the result 
comes from a logical operation, not differing too much from the appli- 
cation of a mathematical axiom. In the formalists’ understanding, it thus 
follows that the answer given to a legal question, whether by a lawyer 
or by a judge, is susceptible of a logical process of review. An erroneous 
result can be identified by tracing back the steps that the lawyer or judge 
was to have followed and finding a step in the operation where that tech- 
nician made a mistake. Why a correct judgment is correct thus can be 
explained by reference to the rules and reasoning on which it is based; 
and an incorrect one can be diagnosed much the same way. 

In Oliver Wendell Holmes, Jr.’s day, though legal formalism already 
had a long line of distinguished antecedents such as Blackstone (whom we 
quoted in our Prologue), one contemporary of Holmes, C. C. Langdell, 
Dean and Librarian of the Harvard Law School, had come to be specially 
associated with it. Holmes himself identified the Dean as arch-exponent 
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of this mode of legal reasoning. In a book review in 1880, he referred to 
Langdell, who was a friend and colleague, as “the greatest living legal the- 
ologian.”* The compliment was a back-handed one when spoken among 
self-respecting rationalists in the late nineteenth century. In private cor- 
respondence around the same time, Holmes called Langdell a jurist who 
“is all for logic and hates any reference to anything outside of it.”* A 
later scholar, from the vantage of the twenty-first century, has suggested 
that Langdell was less a formalist than Holmes and others made him out 
to be but nevertheless acknowledges the widespread association and the 
received understanding: “[l]egal formalism [as associated with Langdell] 
consisted in the view that deductive inference from objective, immutable 
legal principles determines correct decisions in legal disputes.” 

Whether or not Langdell saw that to be the only way the law func- 
tions, Holmes certainly did not, and the asserted contrast between the 
two defined lines which remain familiar in jurisprudence to this day. In 
his own words, Holmes rejected “the notion that the only force at work 
in the development of the law is logic.”° By this, he did mot mean that 
“the principles governing other phenomena [do not] also govern the 
law.”” Holmes accepted that logic plays a role in law: “[t]he processes of 
analogy, discrimination, and deduction are those in which [lawyers] are 
most at home. The language of judicial decision is mainly the language of 
logic.” That deductive logic and inductive reasoning co-exist in law may 
already have been accepted, at least to a degree, in Holmes’s time.” What 
Holmes rejected, instead, was “the notion that a [legal system]... can be 
worked out like mathematics from some general axioms of conduct.”!° It 
was thus that Holmes made sport of a “very eminent judge” who said “he 
never let a decision go until he was absolutely sure that it was right” and 
of those who treat a dissenting judgment “as if it meant simply that one 
side or the other were not doing their sums right, and if they would take 
more trouble, agreement inevitably would come.”!! If the strict formal- 
ist thought that all correctness and error in law are readily distinguished 
and their points of origin readily identified, then Holmes thought that 
formalism as legal theory was lacking. 

The common conception of computer science is analogous to the for- 
malist theory of law. Important features of that conception are writing 
a problem description as a formal specification; devising an algorithm, 
i.e. a step-by-step sequence of instructions that can be programmed on 
a computer; and analyzing the algorithm, for example to establish that 
it correctly solves the specified problem. “The term algorithm is used in 
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computer science to describe a ... problem-solving method suitable for 
implementation as a computer program. Algorithms are the stuff of com- 
puter science: they are central objects of study in the field.”!? In some 
areas the interest is in devising an algorithm to meet the specification. 
For example, given the problem statement Take a list of names and sort 
them alphabetically, the computer scientist might decompose it recursively 
into to sort a list, first sort the first half, then sort the second half, then merge 
the two halves, and then break these instructions down further into ele- 
mentary operations such as swap two particular items in the list. In other 
areas the interest is in the output of the algorithm. For example, given the 
problem statement Forecast the likely path of the hurricane, the computer 
scientist might split a map into cells and within each cell solve simple 
equations from atmospheric science to predict how wind speed changes 
from minute to minute. In either situation, the job of the computer sci- 
entist is to codify a task into simple steps, each step able to be (i) executed 
on a computer, and (ii) reasoned about, for example to debug why a com- 
puter program has generated an incorrect output (i.e. an incorrect result). 
The steps are composed in source code, and scrutinizing the source code 
can disclose how the program worked or failed. Success and failure are 
ready to see. The mistakes that cause failure, though sometimes frustrat- 
ingly tangled in the code, are eventually findable by a programmer keen 
enough to find them. 


2.2 GETTING Past ALGORITHMS 


Machine learning however neither works like algorithmic code nor is to be 
understood as if it were algorithmic code. Outputs from machine learning 
are not effectively explained by considering only the source code involved. 
Kroll et al., whom we will consider more closely in a moment, in a dis- 
cussion of how to make algorithms more accountable explain: 


Machine learning... is particularly ill-suited to source code analysis because 
it involves situations where the decisional rule itself emerges automatically 
from the specific data under analysis, sometimes in ways that no human 
can explain. In this case, source code alone teaches a reviewer very little, 
since the code only exposes the machine learning method used and not 
the data-driven decision rule.!3 
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In machine learning, the job of the computer scientist is to assemble 
a training dataset and to program a system that is capable of learning 
from that data. The outcome of training is a collection of millions of 
fine-tuned parameter values that configure an algorithm. The algorithms 
that computer scientists program in modern machine learning are embar- 
rassingly simple by the standards of classic computer science, but they 
are enormously rich and expressive by virtue of their having millions of 
parameters. 

The backbone of machine learning is a simple method, called gradi- 
ent descent.‘* It is through gradient descent that the system arrives at 
the optimum settings for these millions of parameters. It is how the sys- 
tem achieves its fine-tuning. To be clear, it is not the human programmer 
who fine tunes the system; it is a mathematical process that the human 
programmer sets in motion that does the fine-tuning. Thus built on its 
backbone of gradient descent, machine learning has excelled at tasks such 
as image classification and translation, tasks where formal specification and 
mathematical logic have not worked. These achievements justify the enco- 
mia that this simple method has received. “Gradient descent can write 
code better than you.”! After training, i.e. after configuring the algo- 
rithm by setting its parameter values, the final stage is to invoke the algo- 
rithm to make decisions on instances of new data. It is an algorithm that 
is being invoked, in the trivial sense that it consists of simple steps which 
can be executed on a computer; but its behavior cannot be understood 
by reasoning logically about its source code, since its source code does 
not include the learnt parameter values. 

Moreover, it is futile to try to reason logically about the algorithm 
even given all the parameter values. Such an analysis would be as futile as 
analyzing a judge’s decision from electroencephalogram readings of her 
brain. There are just too many values for an analyst to make sense of. 
Instead, machine-learnt algorithms are evaluated empirically, by measur- 
ing how they perform on test data. Computer scientists speak of “black- 
box” and “white-box” analysis of an algorithm. In white-box analysis we 
consider the internal structure of an algorithm, whereas in black-box anal- 
ysis we consider only its inputs and outputs. Machine-learnt algorithms 
are evaluated purely on the basis of which outputs they generate for which 
inputs, i.e. by black-box analysis. Where computer scientists have sought 
to address concerns about discrimination and fairness, they have done so 
with black-box analysis as their basis.1° In summary, a machine learning 
“algorithm” is better thought of as an opaque embodiment of its training 
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dataset and evaluation criterion, not as a logical rules-based procedure. 
Problems with which machine learning might be involved (such as unfair 
discrimination) thus are not to be addressed as if it were a logical rules- 
based procedure. 


2.3 THE PERSISTENCE OF ALGORITHMIC LOGIC 


Yet people continue to address machine learning as if it were just that—a 
logical rules-based procedure not different in kind from traditional com- 
puter programming based on algorithmic logic. This inadequate way of 
addressing machine learning—addressing it as though the source code of 
an algorithm is responsible for producing the outputs—is not limited to 
legal discourse. It is however very much visible there. Formal, algorithmic 
descriptions of machine learning are ubiquitous in legal literature.!7 The 
persistence of algorithmic logic in descriptions of how computers work is 
visible even among legal writers who otherwise acknowledge that machine 
learning is different.!® 

Even Kroll et al., who recognize that machine learning “is particularly 
ill-suited to source code analysis,” still refer to “a machine [that] has been 
‘trained’ through exposure to a large quantity of data and infers a rule 
from the patterns it observes.”!? To associate machine learning with “a 
rule from the patterns it observes” will lead an unwary reader to conclude 
that the machine has learnt a cleanly stated rule in the sense of law or of 
decision trees. In fact, the machine has done no such thing. What it has 
done is find a pattern which is “well beyond traditional interpretation,” 
these being the much more apt words that Kroll et al. themselves use to 
acknowledge the opacity of a machine learning mechanism.?° 

Kroll and his collaborators have addressed at length the challenges in 
analyzing the computer systems on which society increasingly relies. We 
will turn in Chapters 6-8 to extend our analogy between two revolutions 
to some of the challenges. A word is in order here about the work of Kroll 
et al., because that work highlights both the urgency of the challenges and 
the traps that the persistence of algorithmic logic presents. 

There are many white-box tools for analyzing algorithms, for example 
based on mathematical analysis of the source code. Kroll et al. devote the 
bulk of their paper on Accountable Algorithms (2017) to white-box soft- 
ware engineering tools and to the related regulatory tools that can be used 
to ensure accountability. The persistence of algorithmic logic here, again, 
may lead to a trap: the unwary reader thinks machine learning algorithms 
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are per se algorithms; here are tools for making algorithms accountable; 
therefore we can make machine learning accountable. Kroll et al. mark 
the trap with a warning flag, but it is a rather small one. They concede 
in a footnote that all the white-box techniques they discuss simply do 
not apply to machine learning mechanisms and that the best that can be 
done is to regulate the decision to use machine learning: “Although some 
machine learning systems produce results that are difficult to predict in 
advance and well beyond traditional interpretation, the choice to field 
such a system instead of one which can be interpreted and governed is 
itself a decision about the system’s design.”?! This isn’t saying how to 
understand machine learning better. It’s saying not to use machine learn- 
ing. The result, if one were to follow Kroll et al., would be to narrow the 
problem set to a much easier question—what to do about systems that 
use only traditional algorithmic logic. 

Indeed, it is the easier question that occupies most of Kroll et al.’s 
Accountable Algorithms. Forty-five pages of it discuss white-box analysis 
that doesn’t apply to machine learning systems. Eighteen pages then con- 
sider black-box analysis. An exploration of black-box analysis is more to 
the point—the point being to analyze machine learning. 

But a trap is presented there as well. The pages on black-box analy- 
sis are titled “Designing algorithms to ensure fidelity to substantive pol- 
icy choices.” Black-box analysis by definition is agnostic as to the design 
of the “algorithms” that are producing the results it analyzes. Black-box 
analysis is concerned with the output of the system, not with the inner 
workings that generate the output. To suggest that “[d]esigning algo- 
rithms” the right way will “ensure fidelity” to some external value is to 
fall into the algorithmic trap. It is to assume that algorithmic logic is at 
work, when the real challenge involves machine learning systems, the dis- 
tinctive feature of which is that they operate outside that logic. Black-box 
analysis, which Kroll et al. suggest relies upon the design of an algorithm, 
in fact works just as well in analyzing decisions made by a flock of dys- 
peptic parrots. 

Kroll in a later paper expands the small flag footnote and draws a dis- 
tinction between systems and algorithms.” As Kroll uses the words, the 
system includes the human sociotechnical context—the power dynamics 
and the human values behind the design goals, and so on—whereas the 
algorithm is the technical decision-making mechanism embedded in the 
system. It is the “system,” in the sense that he stipulates, that mainly 
concerns him.?* Kroll argues that a machine learning “system” is neces- 
sarily scrutable, since it is a system built by human engineers, and human 
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choices can always be scrutinized. But, once more, this is an observation 
that would apply just as much if the engineers’ choice was to use the 
parrot flock. It is the essence of the black-box that we know only what 
output it gives, whether a name, a color, a spatial coordinate, or a squawk. 
We don’t know how it arrived at the output. In so far as Kroll addresses 
machine learning itself, he does not offer tools to impart scrutability to 
it but, instead, only this: “the question of how to build effective white- 
box testing regimes for machine learning systems is far from settled.”74 
To say that white-box testing doesn’t produce “settled” answers to black- 
box questions is an understatement. And the problem it understates is the 
very problem that activists and political institutions are calling on com- 
puter scientists to address: how to test a machine learning system to assure 
that it does not have undesirable effects. What one is left with, in talking 
about machine learning this way, is a hope: namely, a hope that machine 
learning, even though it may be the most potent mechanism yet devised 
for computational operations, will not be built into “systems” by the 
many individuals and institutions who stand to profit from its use. Explo- 
ration of white-box, logical models of scrutability reveals little or nothing 
about machine learning. Insistence on such exploration only highlights 
the persistence of algorithmic logic notwithstanding the revolution that 
this technology represents. 

Many accounts of machine learning aimed at non-specialists display 
these shortcomings. Lawyers, as a particular group of non-specialist, are 
perhaps particularly susceptible to misguided appeals to logical models of 
computing. It is true that statutory law has been compared to computer 
source code*°; lawyers who are at heart formalists may find the compar- 
ison comforting.*° It is also true that an earlier generation of software 
could solve certain fairly hard legal problems, especially where a statutory 
and regulatory regime, like the tax code, is concerned. Machine learning, 
however, is not limited in its applications to tasks that, like calculating a 
tax liability, are themselves algorithmic (in the sense that a human oper- 
ator can readily describe the tasks as a logical progression applying fixed 
rules to given facts). Computer source code is not the characteristic of 
machine learning that sets it apart from the kind of computing that came 
before. Lawyers must let go the idea that logic—the stepwise deduction 
of solutions by applying a rule—is what’s at work in the machine learning 
age. Herein, we posit, reading Holmes has a salutary effect. 

Holmes made clear his position by contrasting it against that taken by 
jurists of what we might, if anachronistically, call the algorithmic school, 


2 GETTING PAST LOGIC 27 


that is to say the formalists. In Lochner v. New York, perhaps his most 
famous dissent, Holmes stated, “General propositions do not decide con- 
crete cases.”2” This was to reject deductive reasoning in plain terms and 
in high profile. Whether or not we think that is a good way to think 
about law,7® it is precisely how we must think if we are to understand 
machine learning; machine learning demands that we think beyond logic. 
Computer scientists themselves, as much as lawyers and other laypersons, 
ought to recognize this conceptual transition, for it is indispensable to the 
emergence of machine learning which is now transforming their field and 
so much beyond. 

With the contrast in mind between formal ways of thinking about law 
and about computing, we now will elaborate on the elements of post- 
logic thinking that law and computing share: data, pattern finding, and 
prediction. 


NOTES 


1. Unger (1976) 194. 

2. Unger’s description is not the only or necessarily the definitive description 
of legal formalism. A range of nuances exists in defining legal formalism. 
Unger’s pejorative use of the word “mere,” in particular, may well be jet- 
tisoned: to say that logical application of rules to facts is the path to legal 
results by no means entails that the cognitive operations involved—and 
opportunities for disagreement—are trivial. Dijkstra’s caution that only 
very good mathematicians are ready for the logical rigors of (traditional) 
computing merits an analogue for lawyers. 

. Holmes, Book Notice Reviewing a Selection of Cases on the Law of Con- 
tracts, with a Summary of the Topics Covered by the Cases, By C.C. Langdell, 
14 Am. L. REV. 233, 234 (1880). 

4. Letter from Holmes to Pollock (Apr. 10, 1881), reprinted De Wolfe Howe 
(ed.) (1942) 17. Though Holmes’s skepticism over the formalist approach 
most famously targeted Langdell, Walbridge Abner Field, Chief Justice of 
the Massachusetts Supreme Judicial Court, also fell into the sights. See 
Budiansky (2019) 199-200. 

5. Kimball (2009) 108 and references to writers id. at 108 n. 134. See 
also Cook 88 N.D. L. REV. 21 (2012); Wendel, 96 Corn. L. Rev. 
1035, 1060-65, 1073 (2011). Langdell’s undoubted contribution—the 
case method of teaching law—provoked rebellion by his students for the 
very reason that it did wot start with clear rules: see Gersen, 130 Harv. L. 
Rev. 2320, 2323 (2017). Cf. Grey, Langdell’s Orthodoxy, 45 U. PITT. L. 
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Rev. 1 (1983), who drew attention to the rigor and advantages of classic 
formalism. 

Richard Posner, to whose understanding of Holmes’s legal method we 
will return later below (Chapter 10, p. 119), describes logic as central to 
legal reasoning of the time, though “common sense” was also involved: 


The task of the legal scholar was seen as being to extract a doc- 
trine from a line of cases or from statutory text and history, restate 
it, perhaps criticize it or seek to extend it, all the while striving 
for ‘sensible’ results in light of legal principles and common sense. 
Logic, analogy, judicial decisions, a handful of principles such as 
stare decisis, and common sense were the tools of analysis. 


Posner, 115(5) Harv. L. Rev. 1314, 1316 (2002). 


. 10 Harv. L. REV. at 465. 
. Id. For a consideration of the role of Langdell’s logic in Holmes’s con- 


ception of law, see Brown & Kimball, 45 Am. J. Lec. Hist. 278-321 
(2001). 


. 10 Harv. L. REV. at 465-66. 
. Which is not to say that Holmes necessarily would have agreed with other 


jurists’ understanding of how they co-exist. Edward Levi, in his text on 
legal reasoning, said this: “It is customary to think of case-law reason- 
ing as inductive and the application of statutes as deductive,” Levi, AN 
INTRODUCTION TO LEGAL REASONING (1949) 27. Common law courts 
have described their function in applying rules derived from past cases as 
involving both kinds of reasoning—induction to derive the rules; deduc- 
tion from a rule thus derived. See, e.g., Skelton v. Collins (High Court of 
Australia, Windeyer J., 1966) 115 CLR 94, 134; Home Office v. Dorset 
Yacht Co Ltd. [1970] A.C. 1004, 1058-59 (House of Lords, Diplock LJ). 
Cf. Norsk Pacific Steamship et al. v. Canadian National Railway Co., 1992 
A.M.C. 1910, 1923 (Supreme Court of Canada, 1992, McLachlin J.); In 
the Matter of Hearst Corporation et al. v. Clyne, 50 N.Y.2d 707, 717, 
409 N.E.2d 876, 880 (Court of Appeals of New York, 1980, Wachtler 
J.). Cf. Benjamin Cardozo, THE NATURE OF THE JUDICIAL PROCESS 
(1921) 22-23. Levi’s contrast—between induction for the common law 
and deduction for rules such as statutes contain—has been noted in con- 
nection with computer programming: see, e.g., Tyree (1989) 131. 

10 Harv. L. Rev. at 465. 

Id. 

Robert Sedgewick, ALGORITHMS (4th ed.) (2011) Section 1. 

Kroll et al., Accountable Algorithms, U. Pa. L. REV. at 638 (2017). 

For a formal description of gradient descent, and some of its vari- 
ants, see e.g. Bishop, PATTERN RECOGNITION AND MACHINE LEARNING 


15. 


16. 


17. 


18. 
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(2007) Section 5.2.4. Gradient descent is attributed to the French math- 
ematician Augustin-Louis Cauchy (1789-1857) in a work published in 
1847 (see Claude Lemaréchal, Cauchy and the Gradient Method (2012), 
DOCUMENTAL MATH. p. 251). Gradient descent when applied to finding 
parameter values for a neural network is known as back propagation. 
Tweet by @karpathy, 1.56 p.m., Aug. 4, 2017. https://twitter. 
com/karpathy /status /893576281375219712? See further, e.g., Murphy 
(2012) 247, 445. 

Kroll et al., supra n. 76, at 650-51, 685-87. The tests for discrimination 
in machine learning algorithms described there, which come from Cyn- 
thia Dwork et al., Fairness Through Awareness, TCS Cone. Proc. (3rd) 
(2012), derive from a black-box definition. 

See for example works cited by Barocas & Selbst, Big Data’s Disparate 
Impact, 104 CAL. L. Rev. 671, 674 n. 11. See also Alan S. Gutterman, 
Glossary of Computer Terms, in BUSINESS TRANSACTIONS SOLUTIONS § 
217:146 (Jan. 2019 update) (“the machine merely follows a carefully 
constructed program”); Chessman, 105 CAL. L. Rev. 179, 184 (2017) 
(“Evidence produced by computer programs arguably merits additional 
scrutiny... because the complexity of computer programs makes it diffi- 
cult... to detect errors”); Gillespie in Gillespie et al. (eds.) (2014) 167, 
192 (“algorithmic logic... depends on the proceduralized choices of a 
machine designed by human operators to automate some proxy of human 
judgment”), and the same as quoted by Carroll, Making News: Balanc- 
ing Newsworthiness and Privacy in the Age of Algorithms, 106 GEo. L.J. 
69, 95 (2017). As we will observe below, algorithmic logic persists in 
legislation and regulation as well: see Chapter 6, p. 70. 

See for example Roth, 126 YALE L. J. 1972, 2016-2017 (2017) who 
says, “[A] machine’s programming... could cause it to utter a falsehood 
by design” (emphasis added). Another example is found in Barocas & 
Selbst, 104 CAL. L. Rev. at 674 (2016), who, like Roth, understand that 
“the data mining process itself” is at the heart of machine learning, but 
still say that “[a]lgorithms could exhibit these [invidious] tendencies even 
if they have not been manually programmed to do so...”, a formulation 
which accounts for programs that are not written in the conventional way, 
but which still does not escape the gravity of the idea of source code (i.e., 
that which is “programmed”). Benvenisti has it on the mark, when he cites 
Council of Europe Rapporteur Wagner’s observation that “provision of all 
of the source codes... may not even be sufficient.” Benvenisti, Upholding 
Democracy and the Challenges of New Technology: What Role for the Law 
of Global Governance? 29 EJIL 9, 60 n. 287 (2018) quoting, inter alia, 
Wagner, Rapporteur, Committee of Experts on Internet Intermediaries 
(MSI-NET, Council of Europe), Study on the Human Rights Dimensions 
of Algorithms (2017) 22. 


30 


19 
20 


21. 


22. 
23. 


24. 


25. 


26. 


27. 
28. 


T. D. GRANT AND D. J. WISCHIK 


. Kroll et al., supra n. 76, at 679 (emphasis added). 
. Id., n. 9. Others have used the same terms, similarly running the risk 


of conflating the output of machine learning systems with “rules.” For 
example distinguished medical researchers Ziad Obermeyer & Ezekiel J. 
Emanuel, who have written cogent descriptions of how machine learning 
works, refer to machine learning as “approach[ing] problems as a doc- 
tor progressing through residency might: by learning rules from data” 
(emphasis added). Obermeyer & Ezekiel, Predicting the Future—Big 
Data, Machine Learning, and Clinical Medicine, 375(13) NEJM 1216 
(Sept. 29, 2016). 

Kroll et al. at n. 9. The proviso “some machine learning systems” acknowl- 
edges that at least some machine learning systems such as neural net- 
works are well beyond traditional interpretation. True, some simple sys- 
tems labeled “machine learning” produce easily understood rules and are 
therefore amenable to white-box analysis. One hears sales pitches that use 
the term “machine learning” to refer to basic calculations that could be 
performed in a simple Excel spreadsheet. These are not the systems mak- 
ing the emerging machine learning age, or its challenges, distinctive. 
Kroll (2018) PHIL. TRANS. R. Soc. A 376. 

Kroll (2018) indeed is about systems not algorithms—e.g., it uses “algo- 
rithm” and related words 16 times, and “system” and related words 235 
times. 

Id., p. 11. There have been attempts to find white-box approaches for 
understanding neural networks, notably Shwartz-Ziv & Tishby (2017), 
but such theory is not widely accepted: Saxe et al. (2018). 

See esp. Lessig’s extended analogy: Lessig, CODE AND OTHER LAWS OF 
CYBERSPACE (1999) 3-8, 53. See also Tyree (1989) 131. 

For an analogy from psychology positing that legal formalism comes 
before supposedly “later stages of cognitive and moral development,” see 
Huhn, 48 VILL. L. Rev. 305, 318-39 (2003). We venture no opinion 
as to the whether Huhn’s analogy is convincing, either for law or for 
computer science. 

Lochner v. New York , 198 U.S. 45, 76 (1905) (Holmes, J., dissenting). 
Posner observes jurists have disputed formalism for two thousand years: 
Posner (1990) 24-25. The contrast between formalism and other types 
of legal analysis, whenever the latter arose and whatever name they go 
by (“positivism,” “realism,” etc.), is at the heart of much of the intellec- 
tual—and_ideological—tension in American lawyering, judging, and law 
scholarship. It is visible in international law as well. See Prologue, pp. xii- 
xiii, n. 17. As we stated in Chapter 1, we address the contrast for purposes 
of analogy to machine learning, not to join in a debate over the merits of 
formalism or its alternatives in law. 
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CHAPTER 3 


Experience and Data as Input 


We are entering the era of big data. For example, there are about 1 trillion web 
pages; one hour of video is uploaded to YouTube every second, amounting to 10 years 
of content every day; the genomes of 1000s of people, each of which has a length of 3.8 
x 10° base pairs, have been sequenced by various labs; Walmart handles more than 
IM transactions per hour and has databases containing more than 2.5 petabytes 
(2.5 x 10/5) of information; and so on. 


Kevin P. Murphy, MACHINE LEARNING: A PROBABILISTIC 


PERSPECTIVE, p. 1, © 2012 Massachusetts Institute of 
Technology, published by The MIT Press 


The life of the law has not been logic; it has been experience. 
Oliver Wendell Holmes, Jr., THE COMMON LAW (1881) p. 1 


Holmes, when he articulated a way of thinking about law that departed 
from the prevalent deductive formalism of his day, traced an outline rec- 
ognizable in twenty-first century computer science. The nineteenth cen- 
tury understanding of legal reasoning, which Holmes thought at best 
incomplete, had been that the law, like an algorithm, solves the problems 
given to it in a stepwise, automatic fashion. A well-written law applied 
by a technically competent judge leads to the correct judgment; a bad 
judgment owes to a defect in the law code or in the functioning of the 
judge. Holmes had a contrasting view. In Holmes’s view, the judge con- 
siders a body of information, in the form of existing decisions and also, 
though the judge might not admit it, in the form of human experience at 
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large, and in that body discerns a pattern. The pattern is the law itself. As 
computer science has developed from algorithm to machine learning, it, 
too, has departed from models that find satisfactory explanation in formal 
proof. In machine learning, the input is data, as in the law in Holmes’s 
view the input is experience; and, in both, the task to be performed upon 
a given set of inputs is to find patterns therein. Thus in two different fields 
at different times, a transition has occurred from logic applied under fixed 
rules to a search for patterns. 

In the present chapter, we consider more closely the imputs—expe- 
rience and data; in Chapter 4 we will consider how, in both law and 
machine learning, patterns are found to make sense of the inputs; and 
in Chapter 5 we turn to the outputs, which, as we will see, are predictions 
that emerge through the search for pattern. 


3.1 EXPERIENCE Is INPUT FOR LAW 


To what materials does one turn, when one needs to determine the rules 
in a given legal system? Holmes had a distinctive understanding of how 
that question is in fact answered. In The Common Law, which was pub- 
lished sixteen years before The Path of the Law, Holmes started with a 
proposition that would join several of his aphorisms in the catalogue of 
jurists’ favorites: “The life of the law has not been logic; it has been expe- 
rience.”! This proposition was further affirmation of Holmes’s view that 
logic, on its own, only gets the jurist so far. More is needed if a com- 
prehensive understanding of the legal system is to be reached. Holmes 
proceeded: 


The felt necessities of the time, the prevalent moral and political theories, 
intuitions of public policy, avowed or unconscious, and even the prejudices 
which judges share with their fellow-men, have had a good deal more 
to do than syllogism in determining the rules by which men should be 
governed. The law embodies the story of a nation’s development through 
many centuries, and it cannot be dealt with as if it contained only the 
axioms and corollaries of a book of mathematics.” 


We see here again the idea, recurrent in Holmes’s writing, that law is not 
about formal logic, that it is not like mathematics. We also see an expan- 
sion upon that idea, for here Holmes articulated a theory of where law 
does come from. Where Holmes rejected syllogism—dealing with the law 
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through “axioms and corollaries’—he embraced in its place the system- 
atic understanding of experience. The experience most relevant to the law 
consists of the recorded decisions of organs having authority over the indi- 
vidual or entity subject to a particular legal claim—judgments of courts, 
laws adopted by parliaments, regulations promulgated by administrative 
bodies, and so on. 

Holmes understood experience as wider still, however, for he did not 
invoke only formal legal texts but also “prevalent moral and political the- 
ories, intuitions of public policy... even the prejudices which judges share 
with their fellow-men.”* The texts of law, for Holmes, were part of the 
relevant data but taken on their own not enough to go on. 

In response to Holmes’s invocation of sources such as political the- 
ory and public policy, one might interject that, surely some texts have 
undoubted authority, even primacy, over a given legal system—for exam- 
ple, a written constitution, to give the surest case. In Holmes’s view, how- 
ever, one does not reach the meaning even of a constitution through logic 
alone. It is to history there as well that Holmes would have the lawyer 
turn: 


The provisions of the Constitution are not mathematical formulas that have 
their essence in form, they are organic, living institutions transplanted from 
English soil. Their significance is vital, not formal; it is to be gathered not 
simply by taking the words and a dictionary but by considering their origin 
and the line of their growth.* 


That Holmes was a keen legal historian is not surprising. When he drew 
attention to “a nation’s development through many centuries,” this was 
directly to his purpose and to his understanding of the law. For Holmes, 
experience in its broadest sense goes into ascertaining the law. 


3.2 Data Is INPUT FOR MACHINE LEARNING 


As we suggested in Chapter 2, a common misperception is that machine 
learning describes a type of decision-making algorithm: that you give the 
machine a new instance to decide, that it does some mysterious algorith- 
mic processing, and then it emits an answer. In fact, the clever part of 
machine learning is in the training phase, in which the machine is given 
a dataset, and a learning algorithm converts this dataset into a digest. 
Holmes talked about a jurist processing a rich body of experience from 
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which a general understanding of the law took form. In the case of mod- 
ern machine learning, the “experience” is the data; the general under- 
standing is in the digest, which is stored as millions of finely-tuned param- 
eter values. We call these values “learnt parameters.” The learnt parame- 
ters are an analogue (though only a rather rough one) to the connection 
map of which neurons activate which other neurons in a brain. 

The training dataset—the “experience” from which the system learns— 
is of vital importance in determining the shape that the system eventually 
assumes. Some further words of detail about the training dataset thus are 
in order. 

Computer scientists describe the training dataset in terms of feature 
variables and outcome variables. To see how these terms are used, let us 
take an example of how we might train a machine to classify emails as 
either spam or not spam. The outcome variable in our example is the 
label “spam” or “not-spam.” The feature variables are the words in the 
email. The training dataset is a large collection of emails—together with 
human-annotated labels (human-annotated, because a twenty-first cen- 
tury human, unlike an untrained machine, knows spam when he sees it). 
In the case of legal experience, the facts of a case would be described as 
feature variables, and the judgment would be described as an outcome 
variable. 

There is a subfield of machine learning, so-called “unsupervised” 
machine learning, in which the dataset consists purely of feature variables 
without any outcome variables. In other words, the training dataset does 
not include human-annotated labels. The learning process in that kind of 
machine learning consists in finding patterns in the training dataset. That 
kind of machine learning—unsupervised machine learning—corresponds 
to Holmes’s broader conception of experience as including “prevalent 
moral and political theories” and the whole range of factors that might 
shape a jurist’s learning. Classifications are not assigned to the data a priori 
through the decision of some formal authority. They are instead discerned 
in the data as it is examined. 

After the machine has been trained, i.e. after the machine has carried 
out its computations and thus arrived at learnt parameter values from 
the training dataset, it can be used to give answers about new cases. We 
present the machine at that point with new feature variables (the words in 
a new email, which is to say an email not found in the training dataset), 
and the machine runs an algorithm that processes these new feature vari- 
ables together with the learnt parameters. By doing this, the machine 
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produces a predicted outcome—in our example, an answer to the ques- 
tion whether that new email is to be labeled “spam” or “not-spam.” We 
will consider further below (in Chapter 5)° the predictive character of 
machine learning, which is shared by Holmes’s idea of law. 

Data, especially “big data,” is the grist for machine learning. The word 
data is apt. It comes from the Latin datum, “that which is given,” the 
past participle of dare, “to give.” The dataset used to train a machine 
learning system (whether or not classifications are assigned to the data 
in the dataset a priori) is treated as a given in this sense: the dataset is 
stipulated to be the “ground truth”—the source of authority, however 
arbitrary. A machine learning system doesn’t question or reason about 
what it is learning. The predictions are nothing more than statements in 
the following form: “such and such a new case is likely to behave sim- 
ilarly to other similar cases that belong to the dataset that was used to 
train this machine.” It was an oft-noted inclination of Holmes’s to take 
as a given the experience from which law’s patterns emerge.” The cen- 
tral objection commonly voiced about Holmes’s legal thinking—that he 
didn’t care about social or moral values—would apply by analogy to the 
predictions derived from data. We will explore this point and its implica- 
tions in Chapters 6-10 below. 

In typical machine learning, the training dataset thus is assembled 
beforehand, the parameters are learnt, and then the trained machine is 
put to use. Holmes’s concept of law follows a similar path. The collected 
experience of society (including its written legal texts) may be likened to 
the training dataset. The learnt experience of a jurist may be likened to 
the parameter values in a machine learning system. The jurist is presented 
new questions, just as the machine (after training has produced the learnt 
parameters) is presented new feature variables, and, from both, outputs 
are expected. 

Jurists will naturally keep accumulating experience over time, both 
from the cases they have participated in and from other sources. In a 
particular variant of machine learning, a machine likewise can undergo 
incremental training once it has been deployed. This is described as online 
learning, denoting the idea that the machine has “gone online” (i.e., 
become operational) and continues to train. On grounds of engineering 
simplicity it’s more common, so far, to train the machine and then deploy 
it without any capability for online learning.® 

There is perhaps an aspect of Holmes’s understanding of the law that 
does not (yet) have any counterpart in machine learning, even its online 
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variant: a legal decision is made in anticipation of how it will be used 
as input for future decisions. An anticipatory aspect is not present in 
machine learning in its present state of the art. We will explore this idea 
in Chapter 9. 


3.3 THE BREADTH OF EXPERIENCE 
AND THE LIMITS OF DATA 


Another distinction is that the experience Holmes had in mind is consid- 
erably broader than the typical training datasets used in machine learn- 
ing, and it is less structured. The machine learning system is constrained 
to receive inputs in simple and rigid formats. For example, a machine 
receives an input in the form of an image of prespecified size or a label 
from a prespecified (relatively) small set of possibilities; its output is an 
image or a label of the same form. The tasks that machine learning can 
handle, in the present state of the art, are those where the machine is 
asked to make a prediction about things that are new to the machine, 
but whose newness does not exceed the parameters of the data on which 
the machine was trained. Machine learning is limited in this respect. It 
is limited to data in a particular sense—data as a structured set of inputs; 
whereas the experience in which jurists find the patterns of law is of much 
wider provenance and more varied shape. 

Machine learning, however, is catching up. There is ongoing research 
on how to incorporate broad knowledge bases into machine learning sys- 
tems, for example to incorporate knowledge about the world obtained 
from Wikipedia. Any very large and highly variegated dataset could be an 
eventual training source, if machine learning gets to that goal. The case 
reports of a national legal system would be an example, too, of the kind 
of knowledge base that could be used to train a machine learning sys- 
tem. To the extent that computer science finds ways to broaden the data 
that can be used to train a machine learning system, the data training set 
will come that much more to resemble Holmes’s concept of experience 
as the basic stuff in which are found the patterns—texts of all kinds, and 
experience of all kinds. 

Now, we turn to finding patterns, which is to say how prediction is 
arrived at from the data that is given. 
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NOTES 
1. Holmes (1881) op. cit. Prologue, p. xii, n. 9. 
2. Id. 
3. When writing for the Supreme Court on a question of the law of Puerto 


COND oe 


Rico, Justice Holmes reiterated his earlier idea about experience, here con- 
cluding that the judge without the experience ought to exercise restraint. 
The range of facts that Holmes identified as relevant are similar to those he 
identified forty years earlier in The Common Law: 


This Court has stated many times the deference due to the under- 
standing of the local courts upon matters of purely local concern... 
This is especially true in dealing with the decisions of a Court inher- 
iting and brought up in a different system from that which prevails 
here. When we contemplate such a system from the outside it seems 
like a wall of stone, every part even with all the others, except so 
far as our own local education may lead us to see subordinations to 
which we are accustomed. But to one brought up within it, varying 
emphasis, tacit assumptions, unwritten practices, a thousand influ- 
ences gained only from life, may give to the different parts wholly 
new values that logic and grammar never could have gotten from 
the books. Diaz et al. v. Gonzalez et al., 261 U.S. 102, 105-106, 43 
S.Ct. 286, 287-88 (Holmes, J.) (1923). 


Legal writers, in particular positivists, “have long debated which facts 
are the important ones in determining the existence and content of law.” 
Barzun, 69 STAN. L. Rev. 1323, 1329 (2017). Holmes’s writings support 
a broad interpretation of “which facts...” he had in mind, and he was 
deliberate when he said that it is only “[t]he theory of our legal system... 
that the conclusions to be reached in a case will be induced only by evidence 
and argument in open court, and not by any outside influence”: Patterson 
v. Colorado ex rel. Att'y Gen., 205 U.S. 454, 562 (1907) (emphasis added). 


. Gompers v. United States, 233 U.S. 604, 610 (1914). 

. See Rabban (2013) 215-68. 

. Chapter 5, pp. 54-57. 

. See further Chapter 10, pp. 114-119. 

. Kroll et al., op. cit., n. 76, at 660, point out that online learning systems 


pose additional challenges for algorithmic accountability. 
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CHAPTER 4 


Finding Patterns as the Path from Input 
to Output 


As [Judea Pearl] sees it, the state of the art in artificial intelligence today is merely 
a souped-up version of what machines could already do a generation ago: find 
hidden regularities in a large set of data. “All the impressive achievements of deep 
learning amount to just curve fitting,” he said recently. [...] 

The way you talk about curve fitting, it sounds like you’re not very impressed 
with machine learning [remarks the interviewer]. “No, Pm very impressed, because 
we did not expect that so many problems could be solved by pure curve fitting. It 
turns out they can.” 


Judea Pearl as interviewed by Kevin Hartnett, 
QUANTA MAGAZINE (May 15, 2018)! 


Judea Pearl won the 2011 Turing Award, the “Nobel Prize for computer 
science,” for his work on probabilistic and causal reasoning. He describes 
machine learning as “just curve fitting,” the mechanical process of finding 
regularities in data. The term comes from draftsmen’s use of spline curves, 
flexible strips made from thin pieces of wood or metal or plastic, to draw 
smooth lines through a set of pins. 

In this chapter, we posit a further, specific analogy. We posit an analogy 
between Pearl’s description of machine learning and Holmes’s view of law. 
According to Holmes, the main task in law is finding patterns in human 
experience; law should not be seen simply as an exercise in mathematical 
logic. Likewise, machine learning should be thought of as curve fitting, 
i.e. as finding regularities in large datasets, and not as algorithms that 
execute a series of logical steps. 
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We described in Chapter 2 why it is not helpful to view machine learn- 
ing as an algorithm. It is not an adequate explanation of what makes 
machine learning the powerful tool it has become today to say that it is 
about executing a series of logical instructions, composed in a piece of 
programming code. To understand what makes machine learning distinc- 
tive one has to start with the role of datasets as input, a role we described 
in Chapter 3 above, and which may be analogized to Holmes’s view of 
the jurist’s experience. In this chapter we now examine pattern finding 
more closely, first in law then in machine learning, to see how far the 
analogy might go. 


4.1 PATTERN FINDING IN LAW 


Holmes said in The Path of the Law that identifying the law means “fol- 
low[ing] the existing body of dogma into its highest generalizations.”? 
Two years after The Path, Holmes described law as a proposition that 
emerges when certain “ideals of society have been strong enough to reach 
that final form of expression.”* To describe the law as Holmes did is 
to call for “the scientific study of the morphology and transformation 
of human ideas in the law.”* If the pattern is strong enough, then the 
proposition emerges, the shape becomes clear. 

Holmes returned a number of times to this idea that law is to be identi- 
fied in patterns in human nature and practice. In a Supreme Court judg- 
ment in 1904, he addressed the right of “title by prescription.” Under 
that right, a sustained and uncontested occupation of land can override 
a legal title to that land. Prescription is thus an example where the law 
explicitly recognizes that a pattern of reality on the ground is the law. 
Holmes described prescription like this: 


Property is protected because such protection answers a demand of human 
nature, and therefore takes the place of a fight. But that demand is not 
founded more certainly by creation or discovery than it is by the lapse 
of time, which gradually shapes the mind to expect and demand the con- 
tinuance of what it actually and long has enjoyed, even if without right, 


and dissociates it from a like demand of even a right which long has been 
denied." 


This way of describing title by prescription evoked the search for pattern 
in experience. How society actually behaves and how people think about 
that behavior are facts in which a pattern may be discerned. If the pattern 
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is well enough engrained, if it “shapes the mind” to a sufficient degree, 
and one knows how to discern it, then legal conclusions follow. 

In what is perhaps his most famous dissenting opinion, that in Lochner 
v. New York, Holmes applied this idea about the ideals of society and the 
shape of the law a good deal further. The Supreme Court concluded that 
a New York state statute limiting the hours employees worked in a bakery 
violated the freedom of contract as embodied in the 14th Amendment. 
Holmes, as against his colleagues’ formal reading of the 14th Amend- 
ment, argued that one should interpret the constitutional right in the 
light of the patterns of belief discernible in society: 


Every opinion tends to become a law. I think that the word ‘liberty’ in 
the 14th Amendment, is perverted when it is held to prevent the natural 
outcome of a dominant opinion, unless it can be said that a rational and 
fair man necessarily would admit that the statute proposed would infringe 
fundamental principles as they have been understood by the traditions of 
our people and our law.° 


In the land title case, the rule of title by prescription acknowledged that 
the pattern in human experience is the law. A formal rule, exception- 
ally, there corresponded to what Holmes thought law is. In Lochner, 
by contrast, there was no formal rule that says you are to interpret 
the 14th Amendment by reference to “dominant opinion.” The reading 
that Holmes arrived at in Lochner thus illustrates just how far-reaching 
Holmes’s conception of the law as a process of pattern finding was. Even 
the plain text of the law, which a logician might think speaks for itself, 
Holmes said calls for historical analysis. The meaning of a text is not 
to be found only in its words, but in the body of tradition and opin- 
ion around it: “A word [in the Constitution] is not a crystal, transparent 
and unchanged, but the skin of a living thought.”” Holmes believed that 
we identify the law by systematically examining the shape of what exists 
already and what might later come—“the morphology and transformation 
of human ideas.” 

A good jurist reaches decisions by discerning patterns of tradition 
and practice. The bad jurist treats cases as exercises in logical deduction. 
According to Holmes, “a page of history is worth a volume of logic.”® 
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4.2 So MANY PROBLEMS CAN BE SOLVED 
BY PURE CURVE FITTING 


Judea Pearl expressed surprise that so many problems could be solved 
by curve fitting. And to someone from outside machine learning, it may 
seem preposterous that Holmes’s pattern finding might be analogous to 
drawing a line through a collection of points, as illustrated in the 
figure above. To give some idea of the scope of what machine learning 
researchers express as curve fitting, we now consider some applications. 
We have chosen applications from law and data to keep with our analogy 
to legal pattern finding—but curve fitting applications from any number 
of application areas, such as those from the data science careers website 
that we listed in Chapter 1, would support the same point. 

Our first application relates to Holmes’s famous epigram “The prophe- 
cies of what the courts will do in fact, and nothing more pretentious, are 
what I mean by the law.”? Suppose it were possible to draw a chart sum- 
marizing the body of relevant case law. Each case would be assigned an x 
coordinate encoding the characteristics of the case (the type of plea, the 
set of evidence, the history of the judge, and so on) and a y coordinate 
encoding the outcome of the case (the decision reached, the sentence, 
and so on), and a point would be plotted for each case at its assigned 
x and y coordinates. We could then draw a smooth curve that expresses 
how the y coordinate varies as a function of the x coordinate—i.e. find 
the pattern in the dataset—and we could use this curve to predict the 
likely outcome of a new case given its x coordinate. 

This may sound preposterous, a law school version of plotting poets on 
a chalk board like the English teacher in Dead Poets Society did to ridicule 
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a certain kind of pedantry.!? However, it is an accurate description of 
how machines are able to accomplish such tasks as translating text or 
captioning images. A chalk board has only two dimensions; a machine 
learning system works in many more dimensions, represented through 
mathematical functions. The coordinates are expressed in sophisticated 
geometrical spaces (instead of x, use x1, x2,...,Xn for some large number 
of dimensions n) that go beyond human visualization abilities; but the 
method is nothing more than high dimensional curve fitting. 

The above application is a thought experiment. Here are some actual 
examples borrowed from a recent book on Law As Data!!: 


(i) Predicting whether a bill receives floor action in the legislature, 
given the party affiliation of the sponsor and other features, as well 
as keywords in the bill itself. 

(ii) Predicting the outcome of a parole hearing, given keywords that 
the inmate uses. 

(iii) Predicting the case-ending event (dismissal, summary judgement, 
trial, etc.), given features of the lawsuit such as claim type or plain- 
tiff race or plaintiff attorney’s dismissal rate. 

(iv) Predicting the topic of a case (crime, civil rights, etc.) given the 
text of an opinion. (To a human with a modicum of legal train- 
ing this is laughably simple, but for machine learning it is a 
great achievement to turn a piece of text into a numerical vec- 
tor (x1, X2,...,X,) that can be used as the x coordinate for curve 
fitting. The mathematics is called “doc2vec”.) 

(v) Predicting the decision of an asylum court judge given features of 
the case. (If a prediction can be made based on features revealed 
in the early stages of a case, and if the prediction does not improve 
when later features are included, then perhaps the judge was sleep- 
ing through the later stages.) 


We have used the word “predict” for all of these examples. Most of these 
tasks are predictive in the sense of forecasting, but in the case (iv) the 
word “predict” might strike a layperson as odd. In machine learning, 
the word “predict” is used even when the outcome being predicted is 
already known; what matters is that the outcome is not known to the 
machine making the prediction. Philosophers use the words “postdiction” 
or “retrodiction” for such cases. In Chapter 5 we address in detail why 
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computer scientists use the language of prediction to describe the outputs 
of a machine learning system—and why Holmes used it to describe the 
outputs of law. 


4.3 Norsy DATA, CONTESTED PATTERNS 
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Holmes’s wrote that “a page of history is worth a volume of logic.” When 
lawmakers ask for “the logic involved”? in automated decision making, 
they should really be asking for “a story about the training dataset.” It 
is the data—that which is a given and thus came before—that matters in 
machine learning, just as the history is what matters in Holmes’s idea of 
law—not some formal process of logic. 

But history can be contested. Even when parties agree on the facts, 
there may be multiple narratives that can be fitted.!° Likewise, for a given 
dataset, there may be various curves that may be fitted, as the figure above 
illustrates. We might wish to remove the subjectivity, leaving us with a 
volume of irrefutable logic proving that the decision follows necessarily 
from the premises, but that is the nature neither of law nor of machine 
learning. The phrase “story about the training dataset” is meant to remind 
us of this. 

For some datasets, there may be a clear curve that fits all the data 
points very closely. In Holmes’s language, this corresponds to finding 
patterns in experience that have attained the “final form of expression.” 
The process of finding the law, as Holmes saw it, is the process of finding 
a pattern strong enough to support such “highest generalizations.” Not 
all “existing dogma” lends itself, however, to ready description as law; one 
does not always locate in the body of experience a “crystal, transparent.” 
Likewise, not all datasets have a well-fitting curve; the y coordinates may 
simply be too noisy. 

Some writers refer to machine learning systems as “inferring rules from 
data,” “deriving rules from data,” and the like.!4 We recommend the 
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phrase “finding patterns in data,” because it is better here to avoid any 
suggestion of clean precise law-like rules. The patterns found by machine 
learning are not laws of nature like Newton’s laws of motion, and they are 
not precise stipulative rules in the sense of directives laid down in statutes. 
They are simply fitted curves; and if the data is noisy then the curves will 
not fit well. 

While we have noted here that pattern finding is an element shared 
by machine learning and law, we should also note a difference. Law as 
Holmes saw it, and as it must be seen regardless of one’s legal philosophy, 
is an activity carried out by human beings. Law involves intelligence and 
thought. Machine learning is not thought. Once the human programmer 
has decided which class of curves to fit, the machine “learning” process is 
nothing more than a mechanical method for finding the best fitting curve 
within this class. Caution about anthropomorphizing machine learning is 
timely because there is so much of it, not just in popular culture, but in 
technical writing as well—and it obscures what machine learning really 
is. Machine learning is not thought. It is not intelligence. It is not brain 
activity. Pearl described it as curve fitting to emphasize this point, to make 
clear it is nothing more than a modern incarnation of the draftsman’s 
spline curve. That description does not entail any modesty at all about 
what machine learning can do. It only serves to illustrate how it is that 
machine learning does it. 
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CHAPTER 5 


Output as Prophecy 


In the preceding chapter we considered some of the purposes to which 
machine learning might be put—for example, to predict the topic of a 
court case given words that the judge used in the court’s written judg- 
ment—and we described pattern finding as the method behind prediction. 
More important than the method however is the goal, in this example “to 
predict the topic,” and in particular the keyword predict. We introduced 
that word in the preceding chapter to begin to draw attention to how 
computer scientists use it when they engineer machine learning systems. 
In machine learning systems, predictive accuracy is the be-all and end- 
all—the way to formulate questions, the basis of learning algorithms, and 
the metric by which the systems are judged. In this chapter we consider 
prediction, both in Holmes’s view of law and in the machine learning 
approach to computing. 

In Holmes’s view of law, prediction is central. His answer to the ques- 
tion, What constitutes the law? has become one of the most famous epi- 
grams in all of law: 


The prophecies of what the courts will do in fact, and nothing more pre- 
tentious, are what I mean by the law.! 


Holmes’s interest in the logic and philosophy of probability and statis- 
tics has come more to light thanks to recent scholarship’; he immersed 
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himself early in his career in those subjects. Holmes’s use of the word 
“prophecy” was deliberate. It accorded with his overall view of law by 
getting away from the scientific and rational overtones of “prediction,” 
even as he used that word too. Arguably, given how elusive explanations 
have been of how machine learning systems arrive at the predictions they 
make, “prophecy” is a good term in that context too. 

We expand in this chapter on Holmes’s idea that prophecies constitute 
the law, and then we return to prediction in machine learning. 


5.1 PROPHECIES ARE WHAT LAw Is 


Holmes’s famous epigram has been widely repeated, but it is not widely 
understood. Taken in isolation from The Path of the Law, where Holmes 
set it down, and in isolation from Holmes’s development as a thinker, it 
might sound like no more than a piece of pragmatic advice to a practicing 
lawyer: don’t get carried away by the cleverness of your syllogisms; ask 
yourself, instead, what the judge is going to do in your client’s case. If 
that is all it meant, then it would be good advice, but it would not be a 
concept of law. Holmes had in mind a concept of law. The epigram needs 
to be read in context: 


The confusion with which I am dealing besets confessedly legal concep- 
tions. Take the fundamental question, What constitutes the law? You will 
find some text writers telling you that it is something different from what 
is decided by the courts of Massachusetts or England, that it is a system of 
reason, that it is a deduction from principles of ethics or admitted axioms 
or what not, which may or may not coincide with the decisions. But if we 
take the view of our friend the bad man we shall find that he does not 
care two straws for the axioms or deductions, but that he does want to 
know what the Massachusetts or English courts are likely to do in fact. I 
am much of this mind. The prophecies of what the courts will do in fact, 
and nothing more pretentious, are what I mean by the law.? 


Holmes was contrasting “law as prophecy” to “law as axioms and 
deductions.” He saw an inductive approach to law—the pattern finding 
approach that starts with data or experience—not just to improve upon 
or augment legal formalism. He saw it as a corrective. The declaration in 
his dissent in the Lochner case a few years after The Path of the Law that 
“[g]eneral propositions do not decide concrete cases”* was not just to 
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say that the formal, deductive approach is insufficient; it was to say that 
formalism gets in the way. 

The centrality of concern for Holmes was the reality of decision, the 
output that a court might produce. The realism or positivism in this 
understanding of law contrasted with the formalist school that had long 
prevailed. To shift the concern of lawyers in this way was to lessen the 
role of doctrine, of formal rules, and to open a vista of social and histor- 
ical considerations heretofore not part of the law school curriculum and 
ignored, or at any rate not publicly acknowledged, by lawyers or judges. 
Jurists have been divided ever since as to whether the shift of conception 
was for better or worse. Whatever one’s assessment of it, the concept of 
law as Holmes expressed it continues to influence the law. 

There is more still to Holmes’s epigram about prophecies. True, 
the contrast it entails between the inductive method and the deductive 
method alone has revolutionary implications. But Holmes was not merely 
concerned with what method “our friend the bad man” (or indeed the 
bad man’s lawyer) should employ to predict the outcome of a case. He 
wasn’t writing a law practice handbook. He was interested in individual 
encounters with the law to be sure,’ but this was because he sought to 
reach a general understanding of law as a system. Holmes’s invocation of 
prophecies, like his use of terms from logic and mathematics, was memo- 
rable use of language, but it was more than rhetoric: it was at the core of 
Holmes’s definition of law. He referred to the law as “systematized predic- 
tion.” This was to apply the term “prediction” broadly—indeed across 
the legal system as a whole. Holmes was not sparing in his use of the word 
“prophecy” when defining the law. The word “prophesy” or its derivates 
appear nine times in The Path of the Law.’ He used it in the same sense 
when writing for the Supreme Court.® Holmes’s concern with prediction 
is traceable in his other writings too.” The heart of Holmes’s insight, and 
what has so affected jurisprudence since, is that the law is prediction.!° 
Prophecy does not refer solely to the method for predicting what courts 
will do. Prophecy is what constitutes the law. 

Prophecy of what, by whom, and on the basis of which input data? 

Holmes gave several illustrations. For example, he famously described 
the law of contract as revolving around prediction: “The duty to keep a 
contract at common law means a prediction that you must pay damages 
if you do not keep it, and nothing else.”!! He stated his main thesis in 
similar terms: “a legal duty so called is nothing but a prediction that if a 
man does or omits certain things he will be made to suffer in this or that 
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way by judgment of the court.”!* This is a statement about “legal duty” 
irrespective of the content of the duty. It thus just as well describes any 
duty that exists in the legal system. 

We think that Holmes’s concept of law as prediction indeed is 
comprehensive. Many jurists don’t see it that way. Considering how 
Holmes understood law to relate to decisions taken by courts, one sees 
why his concept of law-as-prophecy often has received a more limited 
interpretation. 

Holmes wrote that “the object of [studying law], then, is prediction, 
the prediction of the incidence of the public force through the instrumen- 
tality of the courts.” 13 Making the equation directly, he wrote, “Law is a 
statement of the circumstances, in which the public force will be brought 
to bear upon men through the courts....”; a “word commonly confined 
to such prophecies... addressed to persons living within the power of the 
courts.”!* It is often assumed that Holmes’s description here does not 
account for the decisions of the highest courts in a jurisdiction, courts 
whose decisions are final. After all, in a system of hierarchy, the organ 
at the apex expects its commands to be obeyed. To call decisions that 
emanate from such quarters “predictions” seems to ignore the reality of 
how a court system works. In a well-functioning legal system, a judgment 
by a final court of appeal saying, for example, that the police are to release 
such and such a prisoner, should lead almost certainly to that outcome. 
The court commands it; the prisoner is released. 

In two respects, however, one perhaps trivial but the other assuredly 
significant, the highest court’s statements, too, belong to the concept of 
law as prophecy. 

First, even in a well-functioning legal system, the court’s decision is 
still only a prediction. As outlandish as the situation would be in which 
the police ignored the highest court, it is a physical possibility. A statisti- 
cian might say that the probability is very high (say, 99.9999%) that the 
highest court’s judgment that the law requires the prisoner to be released 
will in fact result in an exercise of public power in accordance with that 
judgment. We will say more below about the relation between probabil- 
ity and prediction.!> Leaving that relation aside for the moment, a judg- 
ment even of the highest court is a prediction in Holmes’s sense. It is a 
prediction in this way: the implementation of a judgment by agents of 
public power is an act of translation, and in that act the possibility exists 
for greater or lesser divergence from the best understanding of what the 
judge commanded. So the definition of law as prophecy is instanced in 
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the chance that the “public force” will not properly implement the judi- 
cial decision. In a well-functioning legal system, the chance is remote. 
In legal systems that don’t function well, the predictive character of final 
judgments is more immediate, because the risk in those systems is greater 
that the public force will not properly implement the courts’ commands. 
“Finality” in some judicial systems is more formal than real.!° 

The further respect in which the concept of law as prophecy is com- 
prehensive comes to view when we consider how judges decide cases and 
how advocates argue them. In deciding a case, a judge will have in mind 
how that decision is likely to be interpreted, relied upon, or rejected, by 
future courts and academics and public opinion, as well as by the instru- 
ments of public force. The barrister, for her part, in deciding what line 
of argument to pursue, will have in mind how the judge might be influ- 
enced, and this in turn requires consideration of the judge’s predictions 
about posterity. Holmes thus described a case after the highest court had 
decided it as still in the “early stages of law.”!7 As Kellogg puts it, Holmes 
situated a case “not according to its place on the docket but rather in the 
continuum of inquiry into a broader problem.”!® 

The law is a self-referential system, whose rules and norms are conse- 
quences of predictions of what those rules and norms might be.!? Some 
people participate in the system in a basic and episodic way, for example 
the “bad man” who simply wants advice from a solicitor about likely out- 
comes in respect of his situation. Some people participate in a formative 
way. The apex example is the judge of the final court of appeal whose pre- 
dictions about the future of the legal system are embodied in a judgment 
which she expects as a consequence of her authority in the legal system 
to be a perfect prediction of the exercise of public power in respect of 
the case. But her outlook, indeed her self-regard as a judge, entails more 
than that; a judge does more than participate in disconnected episodes 
of judging: she hopes that any judgment she gives in a case, because she 
strives for judgments that withstand the test of time, will be a more or less 
accurate prediction of how a future case bearing more or less likeness to 
the case will be decided. The judge describes her judgment as command, 
not as prophecy; but the process leading to it, and the process as the 
judge hopes it will unfold in the future, is predictive. Law-as-prophecy, 
understood this way, has no gap. 

Holmes’s claim, as we understand it, holds law to be predictive through 
and through. Prophecy is what law is made of. The predictive character of 
law, in this constitutive sense, is visible in the process of judicial decision, 
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regardless what level of the judiciary is deciding; and it is visible in all 
other forms of legal assertion as well. Prophecy embraces all parts of legal 
process. 

So everyone who touches the law is making predictions, from the self- 
interested “bad man” to the judge in the highest court of the land, and 
they make predictions about the full range of possible outcomes. The 
experience that influences their predictions, as we saw in Chapter 3,7° 
Holmes understood to be wide, and the new situations that they make 
predictions about are unlimited. People on Holmes’s path of the law thus 
engage in tasks much broader than standard machine learning tasks. As 
we also discussed in Chapter 3,7! machine inputs, while they consist in 
very large datasets (“Big Data”), are limited to inputs that have been 
imparted a considerable degree of structure—a degree of structure almost 
certainly lacking in the wider (and wilder) environment from which expe- 
rience might be drawn. Machine outputs are correspondingly limited as 
well. Let us now further explore the machine learning side of the anal- 
ogy—and its limits. 


5.2 PREDICTION IS WHAT 
MACHINE LEARNING OUTPUT IS 


Holmes, writing in 1897, obviously did not have machine learning in 
mind. Nevertheless, his idea that prophecy constitutes the law has remark- 
able resonance with machine learning, a mechanism of computing that, 
like law as Holmes understood it, is best understood as constituted by 
prediction. 

The word prediction is a term of art in machine learning. It is used like 
this: 


In a typical scenario, we have an outcome measurement, usually quantita- 
tive (such as a stock price) or categorical (such as heart attack/no heart 
attack), that we wish to predict based on a set of features (such as diet 
and clinical measurements). We have a training set of data, in which we 
observe the outcome and feature measurements for a set of objects (such 
as people). Using this data we build a prediction model, or learner, which 
will enable us to predict the outcome for new unseen objects. A good 
learner is one that accurately predicts such an outcome.” 
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Though in common parlance the term “prediction” means forecasts— 
that is to say, statements about future events—in machine learning the 
term has a wider meaning. We have touched on the wider meaning in 
Chapter 4 and at the opening of the present chapter. Let us delve a little 
more into that wider meaning now. 

It is true that some machine learning outputs are “prediction” in the 
sense in which laypersons typically speak: “Storm Oliver will make landfall 
in North Carolina”? or “the stock price will rise 10% within six months.” 
Other outputs are not predictions in the layperson’s sense. Indeed, the 
main purposes for which machine learning is used do not involve pre- 
dictions of that kind—purposes like classifying court cases by topic or 
controlling an autonomous vehicle. Whatever the purposes for which it 
is used, machine learning involves “prediction” of the more general kind 
computer scientists denote with that term. 

The essential feature of prediction in machine learning is that it should 
concern “the outcome for new unseen objects,” i.e. for objects not in 
the training set. Thus, for example, if the training set consists of labelled 
photographs, and if we treat the pixels of the photograph as features and 
the label as the outcome, then it is prediction when the machine learning 
system is given a new photograph as input data and it outputs the label 
“kitten.” In machine learning prediction, “pre-” simply refers to before 
the true outcome measurement has been revealed to the machine learning 
system. The sense of “pre-” in “prediction” holds even though other par- 
ties might well already know the outcome. For example, the computer 
scientist might well already know that the new photograph is of a tiger, 
not a kitten. That assignment of label-to-picture has already happened, 
but the machine learning system has not been told about it at the point 
in time when the system is asked to predict. Philosophers of science use 
the terms “postdiction” or “retrodiction” to refer to predicting things 
that have already happened.?* These words are not used in the machine 
learning community, but the concept behind them is much what that 
community has in mind when it talks about prediction. 

A significant part of the craft of machine learning is to formulate a 
task as a prediction problem. We have already described how labelling a 
photograph can be described as prediction. A great many other examples 
may be given. Translation can be cast as prediction: “predict the French 
version of a sentence, given the English text,” where the training set is a 
human-translated corpus of sentences. Handwriting synthesis can as well. 
Given a dataset of handwritten text, recorded as the movements of a pen 
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nib, and given the same text transcribed into text in a word processor, 
the task of handwriting synthesis can be cast as prediction: “predict the 
movements of a pen nib, given text from a word processor.” As Judea 
Pearl observed in the interview with which we opened Chapter 4,7° it is 
truly remarkable how many tasks can be formulated this way. In the social 
sciences, it is “a rather new epistemological approach [...] and research 
agendas based on predictive inference are just starting to emerge.”*° A 
theory of law based on predictive inference, however, emerged over a 
century ago: Holmes theorized law to be constituted by prophecy. So 
too might we say that machine learning is constituted by prediction. 

Moreover, prediction is not just the way that machine learning tasks 
are formulated. It is also the benchmark by which we train and evaluate 
machine learning systems in the performance of their tasks. The goal of 
training is to produce a “good learner,” i.e. a system that makes accurate 
predictions. Training is achieved by measuring the difference between the 
machine’s predictions (or postdictions, as the philosophers would say) and 
the actual outcomes in the training dataset; and iteratively tweaking the 
machine’s parameter values so as to minimize the difference. The machine 
that reliably labels tigers as “tigers” has learned well and, at least for that 
modest task, needs no more tweaking. The machine that labels a tiger 
as a “kitten” needs tweaking. The one that labels a tiger as “the forests 
of the night,” though laudable if its task had been to predict settings in 
which tigers are found in the poetry of William Blake, needs some further 
tweaking still to perform the task of labeling animals. This process of 
iterative tweaking, as we noted in Chapter 2, is what is known as gradient 
descent,?” the backbone of modern machine learning. Thus, a mechanism 
of induction, not algorithmic logic, is at the heart of machine learning, 
much as Holmes’s “inductive turn” is at the heart of his revolutionary 
idea of law. 

It is not machine learning’s fundamental characteristic that it can be 
used to forecast future events—when will the next hurricane occur, where 
will it make landfall? One doesn’t need machine learning to make fore- 
casts. One can make forecasts about hurricanes and the like with dice or 
by sacrificing a sheep (or by consulting a flock of dyspeptic parrots). One 
can also make such forecasts with classic algorithms, by simulating dynam- 
ical systems derived from atmospheric science. This sort of prediction is 
not the fundamental characteristic of machine learning. 
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The fundamental characteristic of machine learning is that the system is 
trained using a dataset consisting of examples of input features and out- 
come measurements; until, through the process of gradient descent, the 
machine’s parameter values are so refined that the machine’s predictions, 
when we give it further inputs, differ only minimally from the actual out- 
comes in the training dataset. Judges, litigants, and their lawyers certainly 
try to align their predictive statements of law with what they discern to 
be the relevant pattern in law’s input data, that is to say in the collective 
experience that shapes the law. It is equally the case, in Holmes’s under- 
standing of the law, that we do not test court judgments by comparing 
against stipulated “correct” labels the way our spam email or tiger detec- 
tor was tested. Judgments are, however, tested against future judgments. 
This is to the point we made earlier about the judge’s aim that her judg- 
ments withstand the test of time. The test is whether future judgments 
show her judgment to have been an accurate prediction, or at least not 
so far off as to be set aside and forgotten. 

A machine learning system must be trained on a dataset of input fea- 
tures and outcome measurements. This is in contrast to the classic algo- 
rithmic approach, which starts instead from rules. For example the clas- 
sic approach to forecasting the weather works by solving equations that 
describe how the atmosphere and oceans behave; it is based on scientific 
laws (which are presumably the result of codifying data from earlier exper- 
iments and observation). Just as machine learning rejects rules and starts 
instead with training data, Holmes rejected the idea that law is deriving 
outcomes based on general principles, and he cast it instead as a predic- 
tion problem—prophesying what a court will do—to be performed on 
the basis of experience. 


5.3 LIMITS OF THE ANALOGY 


As we noted in Chapter 3,8 the predictions made by a machine learning 


system must have the same form as the outcomes in the training dataset, 
and the input data for the object to be predicted must have the same form 
as objects already seen. In earlier applications of machine learning, “same 
form” was very narrowly construed: for example, the training set for the 
ImageNet challenge”? consists of images paired with labels; the machine 
learning task is to predict which one of these previously seen labels is 
the best fit for a new image, and the new image is required to be the 
same dimensions as all the examples in the training set. Human ability to 
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make predictions about novel situations is far ahead of that of machines. 
A human lawyer can extrapolate from experience and make predictions 
about new cases that don’t conform to a narrow definition of “cases simi- 
lar to those already seen.” The distance is closing, however, as researchers 
develop techniques to broaden the meaning of “same form.” For exam- 
ple, an image captioning system*” is now able to generate descriptions of 
images, rather than just repeat labels it has already seen. Thus, it is well 
within their grasp for machines to label an image as “tiger on fire in a 
forest,” but they are still a long way, probably, from describing, as the 
poet did, the tiger’s “fearful symmetry.” 

There is a more significant difference between predictions in machine 
learning and in law. In machine learning, the paradigm is that there is 
something for the learning agent—i.e., the machine—to learn. The thing 
to learn is data, something that is given, not a changing environment 
affected by numerous factors—including by the learning agent. A machine 
for translating English to French can be trained using a human-translated 
corpus of texts, and its translations can be evaluated by how well they 
match the human translation. Whatever translations the machine comes 
up with they do not alter the English nor French languages. In law, by 
contrast, the judgment in a case becomes part of the body of experience 
to be used in subsequent cases. Herein, we think, Holmes’s concept of 
law as a system constituted from prediction may hold lessons for machine 
learning. In Chapters 6-8, we will consider some challenges that machine 
learning faces, and possible lessons from Holmes, as we discuss “explain- 
ability” of machine learning outputs?! and outputs that may have invidi- 
ous effects because they reflect patterns that emerge from the data (such 
as patterns of racial or gender discrimination).** In Chapter 9,33 we will 
suggest that Holmes, because he understood law to be a self-referential 
process in which each new prediction shapes future predictions, might 
point the way for future advances in machine learning. 

Before we get to the challenges of machine learning and possible 
lessons for the future from Holmes, we will briefly consider a question 
that prediction raises: does prediction, whether as the constitutive ele- 
ment of law or as the output of machine learning, necessarily involve the 
assessment of probabilities? 
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5.4 PROBABILISTIC REASONING AND PREDICTION 


“For the rational study of the law the blackletter man may be the man 
of the present, but the man of the future is the man of statistics,” said 
Holmes.** It is not certain that Holmes thought that the predictive char- 
acter of law necessarily entails a probabilistic character for law. He was 
certainly interested in probability. In the time after his Civil War service, a 
period that Frederic Kellogg closely examined in Oliver Wendell Holmes, 
Jr. AND LEGAL Locic, Holmes studied theories of probability and was 
much engaged in discussions about the phenomenon, including how it 
relates to logic and syllogism.*° Later, as a judge, he recognized the part 
played by probability in commercial life, for example in the functioning 
of the futures markets.*° In personal correspondence, Holmes said that 
early in his life he had learned “that I must not say necessary about the 
universe, that we don’t know whether anything is necessary or not. So 
that I describe myself as a Jettabilitarian. I believe that we can bet on 
the behavior of the universe...”°” Holmes would have been comfortable 
with the idea that law, in its character as prediction, concerned prob- 
ability as well. Some jurists indeed have discerned in Holmes’s idea of 
law-as-prophecy just such a link.*® 

Predictions made by machine learning are not inherently probabilistic. 
For example, the “ nearest neighbors”*? machine learning algorithm is 
simply “To predict the outcome for a new case, find the k most similar 
cases in the dataset, find their average outcome, and report this as the 
prediction.” The system predicts a value, which may or may not turn out 
to be correct. Modern machine learning systems such as neural networks, 
however, are typically designed to generate predictions using the language 
of probability, for example “the probability that this given input image 
depicts a kitten is 93%.”4° 

Separately, we can classify machine learning systems by whether or not 
they employ probabilistic reasoning to generate their predictions: 


[One type of] Machine Learning seeks to learn [probabilistic] models of 
data: define a space of possible models, learn the parameters and structure 
of the models from data; make predictions and decisions. [The other type 
of] Machine Learning is a toolbox of methods for processing data: feed the 
data into one of many possible methods; choose methods that have good 
theoretical or empirical performance; make predictions and decisions.*! 
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Are legal predictions expressed in the language of probability? Lawyers 
serving clients do not always give probability assessments when they give 
predictions, but sometimes they do.*” Some clients need such an assess- 
ment for purposes of internal controls, financial reporting, and the like. 
Others ask for it for help in strategizing around legal risk. Modern empir- 
ical turns in law scholarship, it may be added, are much concerned with 
statistics.4° Attaching a probability to a prediction of a legal outcome is 
an inexact exercise, but it is not unfamiliar to lawyers. 

Holmes, when he referred to the prophecies of what courts will do, 
is often read to mean that the law should be made readily predictable.** 
Though we don’t doubt he preferred stable judges to erratic ones, we 
don’t see that that was Holmes’s point. Courts whose decisions are hard 
to predict are no less sources of legal decision. Even when the lawyer has 
the privilege to argue in front of a “good” judge, whom for present pur- 
poses we define as a judge whose decisions are easy to predict, the closer 
the legal question, the harder it is to predict the answer. It is inherent 
that lawyers will be more confident in some of their predictions than in 
others. 

Judges, practically by definition of their role as legal authorities, do not 
proffer a view as to the chances that their judgments are correct. It is hard 
to see how the process of judgment would keep the confidence of society, 
if every judgment were issued with a p-value!*> Yet reading judgments 
through a realist’s glasses, one may discern indicia of how likely it is that 
the judgment will be understood in the future to have stated the law. 
Judges do not shy from describing some cases as clear ones; others as 
close ones. They don’t call it hedging, but that’s very much what it’s like. 
When a judge refers to how finely balanced such and such a question 
was, it has the effect of qualifying the judgment. It thus may be that one 
can infer from a judgment’s text how much confidence one should have 
in the judgment as a prediction of future results. The text, even where 
it does not express anything in terms about the closeness of a case, still 
may give clues. The structure of the reasoning may be a clue: the more 
complex and particularistic a judge’s reasoning, the more the judgment 
might be questioned, or at least limited in its future application. Textual 
clues permit an inference as to how confident one should be that the 
judgment accurately reflects a pattern in the experience that was the input 
behind it.*° 

Does the law use probabilistic reasoning to arrive at a prediction? In 
other words, once a judgment has been made and it becomes part of the 
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body of legal experience, do lawyers and judges reason about their level of 
confidence that an earlier judgment is relevant for their predictions about 
a current case? Ex post, every judgment is in fact, to a greater or lesser 
extent, questioned or rejected or ignored—or affirmed or relied upon. 
Nullification, reversal, striking down—by whatever term the legal system 
refers to the process, a rejection of a judgment by a controlling authority 
is a formal expression that the judge got it wrong.*” Endorsement, too, 
is sometimes formal and explicit, the archetype being a decision on appeal 
that affirms the judgment. Formal and explicit signals, whether of rejec- 
tion or of reliance, entail a significant adjustment in how much confidence 
we should have in a judgment as a prediction of a future case. 

It is not just in appeals that we look for signals as to how confident we 
should be in a given judgment as a prediction of future cases. Rejection 
or endorsement might occur in a different case on different facts (i.e., 
not on appeal in the same case) and in that situation is therefore only an 
approximation: ignoring, rejecting, or “distinguishing” a past judgment; 
or invoking it with approval, a judge in a different case says or implies that 
the judge in the past judgment had the law wrong or he had it right, but 
indirect treatment in the new judgment, whether expressed or implied, 
says only so much about the past one. A jurist, considering such indirect 
treatment, would struggle to arrive at a numerical value to adjust how 
much confidence to place in the past judgment.*® In evidence about judg- 
ments—evidence inferable from the words of the judgments themselves 
and evidence contained in their reception—one nevertheless discerns at 
least rough markers of the probability that they will be followed in the 
future. 

There is no received view as to what Holmes thought the function 
of probability is in prediction. As is the case with machine learning, 
jurists make probabilistic as well as non-probabilistic predictions. You can 
state the law—i.e., give a prediction about the future exercise of public 
power—without giving an assessment of your confidence that your pre- 
diction is right. Jurists also use both probabilistic and non-probabilistic 
reasoning. Holmes, when referring to prophecies, was not however telling 
courts how to reason (or for that matter, legislatures or juries; we will 
return to juries in Chapter 7). His concern was to state what it is that con- 
stitutes the law. True, we don’t call wobbly or inarticulate judges good 
judges. But Holmes was explicitly not concerned with the behavior of the 
“good” litigant; and, in his thinking about the legal system as a whole, 
his concern was not limited to the behavior of the “good” judge. 
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CHAPTER 6 


Explanations of Machine Learning 


The danger of which I speak is [...] the notion that a given system, ours, for instance, 
can be worked out like mathematics from some general axioms of conduct. [...] This 
mode of thinking is entirely natural. The training of lawyers is a training in logic. 
The processes of analogy, discrimination, and deduction are those in which they are 
most at home. The language of judicial decision is mainly the language of logic. 
And the logical method and form flatter that longing for certainty and for repose 
which is in every human mind. But certainty generally is illusion, and repose is not 
the destiny of man. 


Oliver Wendell Holmes, Jr., The Path of the Law (1897) 


Calls for “explainability” of machine learning outputs are much heard 
today, in academic and technical writing as well as in legislation such as 
the GDPR, the European Union’s General Data Protection Regulation.! 
We are not convinced that very many lawmakers or regulators understand 
what would need to be done, if the explainability they call for is to be 
made meaningful.” It might seem to a legislator, accustomed to using the 
logical language of law and legal decisions, that an algorithmic decision 
“can be worked out like mathematics from some general axioms.” In the 
law, Holmes rejected the idea that a logical argument gives a satisfactory 
explanation of a judicial decision. Instead, in a play upon Aristotle’s sys- 
tem of logic, he invoked the “inarticulate major premise.”* As a method 
to account for legal decision making, that idea, as Holmes employed it, 
left to one side the formal logic that jurists classically had employed to 
explain a legal output. In this chapter we will suggest that Holmes’s idea 
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of the inarticulate major premise offers a better way to think about expla- 
nations in machine learning—and also throws fresh light on a fundamen- 
tal philosophical stance in machine learning, the “prediction culture.” 


6.1 HOLMES’s “INARTICULATE MAJOR PREMISE” 


The premise behind a decision, it was Holmes’s view, is not always 
expressed. Legal decision-makers offer an apologia, a logical justification 
for the decision they have reached, but the real explanation for a decision 
is to be found in the broad contours of experience that the decision-maker 
brings to bear. As Holmes put it in 1881 in The Theory of Interpretation, 
decision-makers “leave their major premises inarticulate.”* 

Holmes addressed this phenomenon again, and most famously, in his 
dissent in Lochner v. New York. To recall, the Supreme Court was asked 
to consider whether a New York state law that regulated working hours 
in bakeries and similar establishments was constitutionally infirm. The 
majority decided that it was. According to the majority, the law interfered 
with “liberty” as protected by the 14th Amendment of the Constitution. 
Holmes in his dissent wrote as follows: 


Some of these laws embody convictions or prejudices which judges are 
likely to share. Some may not. But a Constitution is not intended to 
embody a particular economic theory, whether of paternalism and the 
organic relation of the citizen to the state or of laissez faire. It is made 
for people of fundamentally differing views, and the accident of our finding 
certain opinions natural or familiar, or novel, and even shocking, ought not 
to conclude our judgment upon the question whether statutes embodying 
them conflict with the Constitution of the United States. 

General propositions do not decide concrete cases. The decision will 
depend on a judgment or intuition more subtle than any articulate major 
premise. But I think that the proposition just stated, if it is accepted, will 
carry us far toward the end. Every opinion tends to become a law. I think 
that the word ‘liberty’ in the 14th Amendment, is perverted when it is held 
to prevent the natural outcome of a dominant opinion, unless it can be said 
that a rational and fair man necessarily would admit that the statute pro- 
posed would infringe fundamental principles as they have been understood 
by the traditions of our people and our law. It does not need research to 
show that no such sweeping condemnation can be passed upon the statute 
before us.” 
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Holmes rejected the straightforward logical deduction that would have 
read like this: The 14th Amendment protects liberty; the proposed statute 
limits freedom of contract; therefore the proposed statute is unconstitutional. 
He did not accept that this “general proposition[...]” contained in the 
14th Amendment could “decide concrete cases,” such as the case that the 
New York working hours law presented in Lochner. It was Holmes’s suspi- 
cion that Jaissez faire economic belief on the part of the other judges was 
the inarticulate premise lurking behind the majority opinion, the premise 
that had led them to offer their particular deduction. Holmes posited 
instead that the meaning of the word “liberty” in the 14th Amend- 
ment should be interpreted in the light of the “traditions of our peo- 
ple and our law” and that applying “judgement or intuition” to the state 
of affairs prevailing in early twentieth century America reveals that the 
pattern of “dominant opinion” did not favor an absolute free market. 
Holmes concluded that the dominant opinion was for an interpretation 
that upheld the New York state statute limiting working hours. It was only 
to a judge who had laissez faire economic beliefs that the statute would 
appear “novel, and even shocking.” It was not a syllogism but the major- 
ity judges’ economic beliefs—and accompanying sense of shock—that led 
them to strike the statute down.° 

So a judge who expresses reasons for a judgment might not in truth 
be explaining his judgment. Such behavior is plainly at odds with the for- 
mal requirements of adjudication: the judge is supposed to say how he 
reaches his decisions. Holmes assumed that judges do not always do that. 
Their decisions are outputs derived from patterns found in experience,’ 
not answers arrived at through logical proof. In Holmes’s view, even 
legal texts, like constitutions, statutes, and past judgments, do not speak 
for themselves. As for artefacts of non-textual “experience”—the sources 
whose “significance is vital, not formal”®’—those display their patterns 
even less obviously. Holmes thought that all the elements of experience, 
taken in aggregate, were the material from which derives the “judgment 
or intuition more subtle than any articulate major premise.” It might not 
even “need research to show” what the premise is. How a decision-maker 
got from experience to decision—how the decision-maker found a pattern 
in the data, indeed even what data the decision-maker found the pattern 
in—thus remains unstated and thus obscure. 
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6.2 MACHINE LEARNING’S 
INARTICULATE MAJOR PREMISE 


It is said—and it is the premise behind such regulatory measures as the 
GDPR—that machine learning outputs require explanation. Holmes’s 
idea of the inarticulate major premise speaks directly to the problem of 
how to satisfy this requirement. Holmes said that the logic presented in 
a judicial decision to justify that decision was not an adequate explana- 
tion, and that for a full explanation one must look also to the body of 
experience that judges carry with them. For Holmes, the formal principle 
stated in a statute, and even in a constitutional provision, is not an ade- 
quate guide to the law, because to discern its proper meaning one must 
look at the traditions and opinions behind it. 

Likewise, when considering the output of a machine learning system, 
the logic of its algorithms cannot supply an adequate explanation. We 
must look to the machine’s “experience,” i.e. to its training dataset. 

One reads in Articles 13, 14, and 15 of the GDPR, the central loci of 
explainability, that meaningful information about an automated decision 
will come from disclosing “the /agic involved.”? This is a category error. 
A machine learning output cannot be meaningfully assessed as if it were 
merely a formula or a sum. A policymaker or regulator who thinks that 
machine learning is like that is like the unnamed judge whom Holmes 
made fun of for thinking a fault in a court judgment could be identified 
the way a mistake might be in arithmetic, or the named judges whom 
he said erred when they deduced that working hour limits on bakers are 
unconstitutional. The logic of deduction, in Holmes’s idea of the law, is 
not where law comes from; it is certainly not in machine learning where 
outputs come from. The real source—the inarticulate major premise of 
law and of machine learning alike—is the data or experience. 

If you follow Holmes and wish to explain how a law or judgment came 
to be, you need to know the experience behind it. If you wish to explain 
how a machine learning process generated a given output, you need to 
know the data that was used to train the machine. If you wish to make 
machine learning systems accountable, look to their training data not their 
code. If there is something that one does not like in the experience or in 
the data, then chances are that there will be something that one does not 
like in the legal decision or the output. 
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6.3 THe Two CULTURES: SCIENTIFIC EXPLANATION 
VERSUS MACHINE LEARNING PREDICTION 


To explain a decision, then, one must explain in terms of the data or 
experience behind the decision. But what constitutes a satisfactory expla- 
nation? In Law in Science and Science in Law, Holmes in 1899 opened 
the inquiry like this: 


What do we mean when we talk about explaining a thing? A hundred 
years ago men explained any part of the universe by showing its fitness 
for certain ends, and demonstrating what they conceived to be its final 
cause according to a providential scheme. In our less theological and more 
scientific day, we explain an object by tracing the order and process of its 
growth and development from a starting point assumed as given.!? 


Even where the “object” to be explained is a written constitution, Holmes 
said an explanation is arrived at “by tracing the order and process of its 
growth and development,” as if the lawyer were a scientist examining 
embryo development under a microscope.!! And yet for all of Holmes’s 
scientific leaning, his best known epigram is expressed with an emphat- 
ically non-scientific word: “The prophecies of what the courts will do in 
fact, and nothing more pretentious, are what I mean by the law.”!? 

Holmes seems to anticipate a tension that the philosophy of science has 
touched on since the 1960s and that the nascent discipline of machine 
learning since the 2000s has brought to the fore: the tension between 
explaining and predicting. 

In the philosophy of science, as advanced in particular by Hempel,!* an 
explanation consists of (i) an explanans consisting of one or more “laws 
of nature” combined with information about the initial conditions, (ii) an 
explanandum which is the outcome, and (iii) a deductive argument to go 
from the explanans to the explanandum. In fact, as described by Shmueli 
in his thoughtful examination of the practice of statistical modelling To 
explain or to predict?\* it is actually the other way round: the goal of 
statistical modelling in science is to make inferences about the “laws of 
nature” given observations of outcomes. Terms like “law” and “rule” are 
used here. Such terms might suggest stipulation, like legal statutes, but in 
this context they simply mean scientific or engineering laws: they could 
be causal models!* that aim to approximate nature, or they could simply 
be equations that describe correlations. 
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In the machine learning/prediction culture championed by Leo 
Breiman in his 2001 rallying call Statistical modelling: the two cultures, 
from which we quoted at the opening of Chapter 1, the epistemological 
stance is that explanation in terms of laws is irrelevant; all that matters 
is the ability to make good predictions. The denizens of the prediction 
culture sometimes have an air of condescension, hinting that scientists 
who insist on an explanation for every phenomenon are simpletons who, 
if they do not understand how a system works, can’t imagine that it has 
any value. In the paper describing their success at ImageNet Challenge in 
2012—following which the current boom in machine learning began— 
Krizhevksy et al. noted the challenge of getting past the gatekeepers of 
scientific-explanatory culture: “[A] paper by Yann LeCun and his collab- 
orators was rejected by the leading computer vision conference on the 
grounds that it used neural networks and therefore provided no insight 
into how to design a vision system.”!© LeCun went on to win the 2018 
Turing Award (the “Nobel prize for computer science”) for his work on 
neural networks!” and to serve as Chief AI Scientist for Facebook.!® 

Here is an illustration of the difference between the two cultures, as 
applied to legal outcomes. Suppose our goal is to find a formula for the 
probability that defendants will flee if released on bail: here we are infer- 
ring a rule, a formula that relates the features of objects under consid- 
eration to the outcomes, and which can be applied to any defendant. Or 
suppose our goal is to determine whether the probability is higher for vio- 
lent crime or for drug crime all else being equal: here again we are making 
an inference about rules (although this is a more subtle type of inference, 
a comparative statement about two rules which does not actually require 
those rules to be stated explicitly). 

By contrast, suppose our goal is to build an app that estimates the 
probability that a given defendant will flee: here we are engaging in pre- 
diction.!? We might make a prediction using syllogistic inference, or by 
reading entrails, or with the help of machine learning. The distinguishing 
characteristic of prediction is that we are making a claim about how some 
particular case is going to go. 

Making a prediction about a particular case and formulating a rule of 
general application are tightly interwoven. Their interweaving is visible in 
judicial settings. One of Holmes’s Supreme Court judgments is an exam- 
ple. Typhoid fever had broken out in St. Louis, Missouri. The State of 
Missouri sued Illinois, on the theory that the outbreak was caused by a 
recent change in how the State of Illinois was managing the river at the 
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city of Chicago. In State of Missouri v. State of Illinois Holmes summa- 
rized Missouri’s argument as follows: 


The plaintiffs case depends upon an inference of the unseen. It draws 
the inference from two propositions. First, that typhoid fever has increased 
considerably since the change, and that other explanations have been dis- 
proved; and second, that the bacillus of typhoid can, and does survive the 
journey and reach the intake of St. Louis in the Mississippi.2° 


In support of this second proposition, Missouri put forward rules, for- 
mulated with reference to its experts’ observations, stating how long the 
typhoid bacillus survives in a river and how fast the Mississippi River 
might carry it from Chicago to St. Louis. If you accept the rules that Mis- 
souri formulated from its experts’ observations, then you could express 
the situation like this: 


Let x = miles of river between downstream location of outbreak and 
upstream location of a typhoid bacillus source. 


Let y = rate in miles per day at which typhoid bacillus travels down- 
stream in the river. 


Let z = maximum days typhoid bacillus survives in the river. 


If x + y < z, then the bacillus survives—and downstream plaintiff 
wins; 


If x + y > z, then the bacillus does not survive—and downstream 
plaintiff loses. 


Expressing the situation this way necessarily has implications for other 
cases. Justice Holmes drew attention to the implications: the winning for- 
mula for Missouri as plaintiff against Illinois might well later have been a 
losing one for Missouri as defendant against a different state. “The plain- 
tiff,” Holmes wrote, “obviously must be cautious upon this point, for if 
this suit should succeed, many others would follow, and it not improba- 
bly would find itself a defendant to a [suit] by one or more of the states 
lower down upon the Mississippi.”7! 
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Missouri was making an inference of the unseen im a particular 
instance, which in machine learning terminology is referred to as predic- 
tion. Missouri used general propositions to support this prediction, and 
Holmes (with his well-known distrust of the general proposition) warned 
that such reasoning can come back to bite the plaintiff. 

The difference between finding rules and making predictions might 
seem slight. If we have a rule, we can use it to make predictions about 
future cases; if we have a mechanism for making predictions, that mech- 
anism may be seen as the embodiment of a rule. Hempel did not see 
any great difference between explanation and prediction. To Hempel, an 
explanation is after the fact, a prediction is before the fact, and the same 
sort of deductive reasoning from natural laws applies in both cases. 

But what if it is beyond the grasp of a simple-minded philosopher— 
or, for that matter, of any human being—to reason about the predictive 
mechanism? This is the real dividing line between the two cultures. The 
scientific culture is interested in making inferences about rules, hence a 
fortiori practitioners in the scientific culture will only consider rules of 
a form that can be reasoned about. The prediction culture, by contrast, 
cares about prediction accuracy, even if the prediction mechanism is so 
complex it seems like magic. 

Arthur C. Clarke memorably said, “Any sufficiently advanced tech- 
nology is indistinguishable from magic.”?* Clarke seems to have been 
thinking about artefacts of a civilization more advanced than that of the 
observer trying to comprehend them. Thus, a stone age observer, pre- 
sented with a video image on a mobile phone, might think it magical. It 
would take more than moving pictures to enchant present-day observers, 
but we as a society have built technological artefacts whose functioning 
we struggle to explain. 

The prediction culture says that we should evaluate an artefact, even 
one that seems like magic, by whether or not it actually works. We can still 
make use of machines that embody impenetrable mechanisms; we should 
evaluate them based on black-box observations of their predictive accu- 
racy. A nice illustration may be taken from a case from the U.S. Court of 
Appeals for the 7th Circuit in 2008. A company had been touting metal 
bracelets. The company’s assertions that the bracelets were effective as 
a cure for various ailments were challenged as fraudulent. Chief Judge 
Easterbrook, writing for the 7th Circuit, recalling the words of Arthur 
C. Clarke that we’ve just quoted above, was dubious about “a person 
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who promotes a product that contemporary technology does not under- 
stand”; he said that such a person “must establish that this ‘magic’ actually 
works. Proof is what separates an effect new to science from a swindle.”?* 
Implicit here is that the “proof,” while it might establish that the “magic” 
works, does not necessarily say anything about ow it works. Predicting 
and explaining are different operations. Easterbrook indeed goes on to say 
that a placebo-controlled, double-blind study—that is, the sort of study 
prescribed by the FDA for testing products that somebody hopes to mar- 
ket as having medical efficacy—is “the best test” as regards assertions of 
medical efficacy of a product.** Such a test, in itself, solely measures out- 
puts of the (alleged) medical device; it is not “proof” in the sense of a 
mathematical derivation. It does not require any understanding of how 
the mechanism works; it is just a demonstration that it does work. True, 
a full-scale FDA approval process—a process of proof that is centered 
around the placebo-controlled, double-blind study that the judge men- 
tions—also requires theorizing as to how the mechanism works, not just 
black-box analysis. But Easterbrook here, focusing on proof of efficacy, 
makes a point much along the lines of Breiman: a mechanism can be eval- 
uated purely on whether one is satisfied with its outcomes, rather than on 
considerations such as parsimony or interpretability or consonance with 
theory.?° A mechanism can be evaluated by seeking to establish whether 
“this ‘magic’ actually works.” 

Holmes made clear his view that a judicial explanation is really an 
apologia rather than an explanation, and that the real explanation is to be 
found by looking for the “inarticulate major premise” that comes from 
the jurist’s body of experience. Holmes shied away from asking for logical 
or scientific explanations as a way to understand the jurist’s experience. 
He instead invoked prophecy. Holmes went beyond logic (because sim- 
ple mathematical arguments are inadequate), and beyond scientific expla- 
nation (perhaps because such explanation would either be inaccurate or 
incomprehensible when applied to jurists’ behavior), and he came finally 
to prediction. In this, Holmes anticipated machine learning. 


6.4 Wuy WE STILL WANT EXPLANATIONS 


The inarticulate major premise, starting immediately after Holmes’s 
dissent in Lochner, provoked concern, and it continues to.2° Unex- 
plained decisions, or decisions where the true reasons are obscured, are 
inscrutable, and, therefore, the observer has no way to tell whether the 
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reasons are valid. Validity, for this purpose, may mean technical correct- 
ness; it also may mean consonance with basic values of society. Testing 
validity in both these senses is an objective behind explainability. We turn 
here in particular to values.?” 

Eminent readers of Holmes conclude that he didn’t have much to say 
about values.?S But he was abundantly clear that, whatever the values in 
society might be, if they form a strong enough pattern, then they are 
likely to find expression in law: “Every opinion tends to become law.”?? 
Whether or not one has an opinion about the opinion that becomes law, 
Holmes described a process that has considerable present-day resonance. 
Data from society at large will embody opinions held in society at large; 
and thus a machine learning output derived from a pattern found in the 
data will itself bear the mark of those opinions. 

The influence of opinions held in society would be quite straightfor- 
ward if there were no conflicting opinions. But many opinions do conflict. 
Holmes plainly was concerned about discordance over values; it was to 
accommodate “fundamentally differing views” that he said societies adopt 
constitutions.’ Less clear is whether he thought that certain values are 
immutable, imprescriptible, or in some fashion immune to derogation. 
He suggested that some might be: he said that a statute might “infringe 
fundamental principles.” He didn’t say what principles might be funda- 
mental. 

A law, if it embodied certain biases held in society, would infringe prin- 
ciples held to be fundamental today. Examples include racial and gender 
bias. In Holmes’s terms, those are “opinions” that should not “become 
law.” Preventing them from becoming law is a central concern today. The 
concern arises, mutatis mutandis, with machine learning outputs. Where 
machine learning outputs have legal effects, they too will infringe fun- 
damental principles, if they embody biases such as racial or gender bias. 
Preventing such “opinions” from having such wider influence is one of 
the main reasons that policy makers and writers have called for explain- 
ability. 

In short, in both processes, law and machine learning, the risk exists 
that experience or data shaped a decision that ought not have been 
allowed to.*! In both, however, the experience or the data might not 
be readily visible.3? As we will explore in Chapters 7 and 8, much of the 
concern over its potential impact on societal values relates to this obscu- 
rity in machine learning’s operation. 
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arxiv.org/pdf/1810.08810.pdf. 

See generally Pasquale (2015). Though the emphasis in the 2015 title on 
algorithms is misplaced, Pasquale elsewhere has addressed distinct prob- 
lems arising from machine learning: Pasquale (2016). 
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CHAPTER 7 


Juries and Other Reliable Predictors 


In the formalist’s account, a jury settles factual questions; only the judge 
settles questions of law.! Holmes thought that juries exercise a wider 
influence than that.* “I don’t like to be told that I am usurping the 
functions of the jury if I venture to settle the standard of conduct myself 
in a plain case,” Holmes wrote in dryly humorous vein to his long-time 
friend Frederick Pollock. “Of course, I admit that any really difficult 
question of law is for the jury, but I also don’t like to hear it called a 
question of fact...”3 Whatever the proper way to describe the division of 
responsibilities between judge and jury, Holmes saw the latter to be a sort 
of conduit into the courtroom of the general understandings—and feel- 
ings—prevalent in the community. In its best aspect, the jury brought the 
collective experience of the community to questions that best are settled 
through the application of common sense. The jury, as an embodiment of 
community experience, is called upon when what matters is “the nature 
of the act, and the kind and degree of harm done, considered in the 
light of expediency and usage.”* However, the jury is involved in several 
vexing problems. Analogous problems arise with machine learning. 


7.1 PROBLEMS WITH JURIES, 
PROBLEMS WITH MACHINES 


Juries, wrote Holmes, “will introduce into their verdict a certain 
amount—a very large amount, so far as I have observed—of popular prej- 
udice.”° True, juries “thus keep the administration of law in accord with 
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the wishes and feelings of the community.”° Reading this “accord with” 
clause on its own, one might think it flattery. Reading in context, one sees 
that Holmes had no intention to flatter juries. Addressing why we have 
juries at all, Holmes added, 


[S]uch a justification is a little like that which an eminent English barrister 
gave me many years ago for the distinction between barristers and solici- 
tors. It was in substance that if law was to be practised somebody had to 
be damned, and he preferred that it should be somebody else.” 


Holmes, in stating that juries produce results “in accord with the wishes 
and feelings of the community,” was stating a fact about the behavior 
of juries as he had observed many times juries to behave. He was not 
conferring a blessing upon them for behaving that way. 

Three particular problems with juries suggested to Holmes a need 
for caution. In each a similar problem may be discerned with machine 
learning. 

First, there is a problem of accountability. Holmes, in referring to the 
barrister who preferred that “somebody else” bear the blame, was refer- 
ring to the practice of English advocates not to make a point about advo- 
cacy but to make a point about decision-making. If a judge is faced with 
an intractable question about which he would really prefer not to make 
a decision but must for purposes of deciding the case, he looks for a way 
to send that question to the jury. Some decisions are unlikely to please 
all parties concerned. Some decisions are likely, instead, to provoke criti- 
cism and resistance. If an authority, such as a judge, has the discretion to 
devolve such decisions upon another actor, then the temptation exists to 
do so, because if the other actor makes the decision then the authority 
removes himself from blame. 

A temptation to devolve decisions exists with machine learning. The 
role of machine learning systems in actual decision practice grows apace. 
The growth is visible, or foreseeable, in banks,’ on highways,’ on the 
battlefield,!° in the courtroom.!! Instead of a human being making the 
decision—such as a bank loan officer or a sentencing judge—the deci- 
sion is given to a machine. A certain distance now separates the human 
being from the decision and its consequences. If “somebody had to be 
damned,” e.g. for denying home mortgages or giving long jail sentences 
on invidious criteria, then authorities and their institutions might well like 
to see that distance increase. Given enough distance, the machine makes 
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the decision and is, perhaps, to be damned; but the human being holds 
up his hands and declares, Don’t look at me. 

So far, decision making by machine has not relieved human beings of 
the duty, where the law imposes it, to give account for actions that they 
set in train. It has however started to raise questions of causation.!* Those 
questions are likely to multiply if the separation continues to increase (as 
it probably will) between a human being setting a machine process in 
train and the practical impact of the machine decision. Proposals, such 
as that in the European Parliament in 2017 to confer legal personality 
on machines,!* would pave the way to even further separation (and for 
that reason, among others, are a bad idea).!4 We find a timely message in 
Holmes’s caution toward the devolution of decisions to juries. 

Second, there is propagation of “popular prejudice.” Just as the jury 
is faithful to the experience it brings to the court room and thus delivers 
a verdict that reflects the patterns in that experience, so will the machine 
be faithful to its training data. If the training data embody a pattern of 
community prejudice that we do not wish to follow, then we will not 
be pleased with the decision that the machine delivers.!> Holmes viewed 
juries as a mechanism to transmit community experience into legal deci- 
sion. Affirming that they are reliable in that function is to concede a cer- 
tain admiration, but it is just as much to sound a warning. No less reliable 
is the machine learning system in producing an output using the data that 
it is given. Both are mechanisms that rely on their givens—their experi- 
ence and the data. Their outputs necessarily reflect the patterns they find 
therein.!° 

Finally, there is the problem of how to scrutinize black-box decision 
making. When policy-makers call for machine decision making to be scru- 
tinized, they encounter a difficulty resembling that which arises when we 
try to figure out how a jury reached its decision: it is difficult to see inside 
the decision making mechanism. In Chapter 2,!7 we addressed the chal- 
lenges involved in explaining the outputs of black-box systems. Holmes 
noted the challenges in testing juries and similar bodies. In considering a 
decision by a state taxation tribunal, which because that tribunal was con- 
stituted of laypersons was like a jury in the relevant way, Holmes wrote, 
“how uncertain are the elements of the evidence, and in what unusual 
paths it moves”!®; an appellate court must “mak[e] allowance for a cer- 
tain vagueness of ideas to be expected in the lay mind.”!? Even the judge, 
thought Holmes, tended, “[w]here there is doubt...” to produce deci- 
sions derived from reasons that are “disguised and unconscious.””° Even 
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the judge, schooled in the formalities of logic, produces outputs that are 
inscrutable. The jury is all the less likely to supply a clear trace of what led 
it to decide as it did. Here again is the phenomenon of occluded reason- 
ing, the hidden layers of legal process. Explaining a machine learning out- 
put presents in a new setting a problem familiar since at least Holmes’s day 
in law. 


7.2 WHAT TO Do ABOUT THE PREDICTORS? 


Holmes, after he came to be cast in the role of progenitor of legal real- 
ism, was widely described as complacent about social problems, callous, 
even, in the face of injustice. Present-day observers have gone so far as 
to call Holmes “corrosive” because he seemed to accept the state of 
affairs as it was.2! In one judgment, which his critics have often cited, 
Holmes upheld a statute of Virginia under which the Commonwealth 
sterilized certain persons deemed “feebleminded.”?? Holmes supplied 
personal material, too, that later observers would use to characterize him 
as cold or resigned. Writing in 1927 in a letter to his friend Harold Laski, 
Holmes said “I do accept ‘a rough equation’ between isness and ought- 
ness.”2° To Morris Cohen, he wrote in 1921, “I do in a sense worship 
the inevitable.”?* Writing around the same time to another correspon- 
dent, Holmes doubted that rational improvements would be made in the 
world at least in the immediate future: 


We all try to make the kind of a world that we should like. What we 
like lies too deep for argument and can be changed only gradually, often 
through the experience of many generations.” 


One might read Holmes to counsel acceptance of whatever prediction 
the patterns in past experience suggest. And, yet, Holmes took account 
of the desire for change—the effort by all to “make the kind of a world 
that we should like.” He also showed personal interest in members of 
the younger generation who sought change; the correspondent to whom 
he wrote about the kind of a world that we should like was John C. H. 
Wu, a 22 year old law student at the time and much concerned with 
lifting China, his native land, out of its then century-long malaise.?° The 
standard account—of Holmes as fatalist—is not supported by his record 
of encouraging people such as Wu in their efforts to escape the limits of 
experience. 
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Moreover, concerning experience and its influence on decision, 
Holmes did not think that all mechanisms for discerning patterns in 
experience merit equal deference. The legislature was the mechanism, in 
Holmes’s understanding, that a court was least to question; deference to 
the legislature was a precept that Holmes faithfully applied. It was that 
deference that is visible in his judgment in the Virginia sterilization case. 
One sees it as well in his dissents in Lochner and similar cases, where 
he decried the Court for second-guessing laws that had been enacted 
under proper legislative procedures and that aimed at various purposes 
that today would be called socially progressive. 

In a number of appeals from jury verdicts, Holmes had a very differ- 
ent response than he had to challenges against statutes. He overturned 
jury verdicts or dissented against majorities that didn’t. Moore v. Dempsey 
is the most prominent of the cases in which Holmes considered how a 
jury, perfectly “in accord with the wishes and feelings of the community,” 
produced a result that demanded correction. Five defendants, all African 
American, had been arrested. The grand jury that returned indictments 
against them had been comprised of whites only, including the mem- 
bers of an attempted lynch mob. The evidence against them was scarce; 
their defense lawyers, appointed by the court, had done nothing in their 
defense; and the “Court and the neighborhood were thronged with an 
adverse crowd that threatened the most dangerous consequences to any- 
one interfering with the desired result.”*° The jury at trial found the 
defendants guilty; the sentence was death. In Holmes’s words, “counsel, 
jury and judge were swept to the fatal end by an irresistible wave of pub- 
lic passion.”2° True, the situation in Moore v. Dempsey had been that the 
facts “if true as alleged... [made] the trial absolutely void.”°° The body 
assembled to function as a jury had not functioned as a jury at all. At the 
same time, the transactions in the courthouse exemplified, in extremis, the 
jury as decision-maker. Extreme example though it was, the case shone a 
light on how the jury works, and juries work that way even when all per- 
sons involved have acted in good faith: juries find their patterns in the 
experience around them. A corrective is called for, when that experience 
discords with our better understanding of the world we wish to have. 

As the data that trains the machine learning system is a given, both in 
Latin grammar and in the process of machine learning, so too is the expe- 
rience that Holmes understood to be the main influence on law. Holmes’s 
personal outlook was congenial to accepting givens. However, Holmes 
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was alert to the danger that givens present for certain kinds of decision- 
making that rely upon them as their inputs. He didn’t suggest that the 
jury could do any differently. He placed the corrective someplace else, 
namely in the hands of the court of appeal. We will turn in the next 
chapter to consider more closely how Holmes understood the corrective 
to work in a particular situation—that where public authorities had gar- 
nered evidence in breach of the constitutional right of a defendant—and 
how a practice we will call inferential restraint has been necessary in legal 
procedures and is likely to be in machine learning as well. 
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CHAPTER 8 


Poisonous Datasets, Poisonous Trees 


As we addressed in preceding chapters,! data, or “experience” as Holmes 
referred to the inputs in law, influences decision-making in a number of 
ways. It might influence decision-making in such a way that the deci- 
sion made is illegal, immoral, unethical, or undesirable on some other 
grounds. Both legal decision-making and machine learning have struggled 
about what to do when presented with data that might influence decision- 
making in such a way. Courts have attempted to exclude particular pieces 
of data, what we will call “bad evidence,” from the decision-making pro- 
cess altogether.? Exclusion before entry into the process removes the 
problem: data that doesn’t enter doesn’t affect the process. However, 
exclusion also may introduce a problem. In both legal settings and in 
machine learning, particular pieces of evidence or data might have unde- 
sired effects, but the same inputs might assist the process or even be 
necessary to it. It also might be that no mechanism exists that will reli- 
ably exclude only what we aim to exclude. So exclusion, or its collateral 
effects, may erode the efficacy or integrity of the process. Lawyers refer 
to the “probative value” of a piece of evidence, an expression they use 
to indicate its utility to the decision process—even when a risk exists that 
that evidence might have undesired effects. The vocabulary of machine 
learning does not have such a received term here, but data scientists, as we 
described in Chapter 3, know well that the data that trains the machine 
is essential to its operation. 

Exclusion is not the only strategy courts have used to address bad evi- 
dence and its kindred problem, bias. Another is to restrain the inferences 
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that the decision-maker draws from certain evidence that might other- 
wise have undesirable effects on decision-making. This strategy entails an 
adjustment to the inner workings of the process of decision itself. 

Finally, restraint may be imposed at a later stage. For example, courts 
review outputs (verdicts and judgments) and, if they’re not in accord with 
certain rules, strike them down, which, in turn, means that the instru- 
ments of public power will not act on them. In that strategy, a decision- 
making mechanism—for example, a jury, and as much might be said of 
a machine learning system—was not under inferential restraint (or it was 
but it ignored the restraint); the output the mechanism gives, on review, 
is unacceptable in some way; and, so, the output is not used. The restraint 
did not operate within the mental or computational machinery that gener- 
ated an output but, instead, upon those persons or instrumentalities who 
otherwise would have applied the output in the world at large. Defect in 
the output discerned, they don’t apply it. 

We turn now to consider more closely the problem of bad evidence; 
the limits of evidentiary exclusion as a strategy for dealing with bad evi- 
dence in machine learning; and the possibility that restraining the infer- 
ences drawn from data and restraining how we use the outputs that a 
machine reaches from data—strategies of restraint that have antecedents 
in jurisprudence—might be more promising approaches to the problem 
of bias in the machine learning age. 


8.1 THE PROBLEM OF BAD EVIDENCE 


As an Associate Justice of the Supreme Court, Holmes had occasion in 
Silverthorne Lumber Co. v. United States* to consider a case of bad evi- 
dence. Law enforcement officers had raided a lumber company’s premises 
“without a shadow of authority” to do so.” It was uncontested that, in 
carrying out the raid and taking books, papers, and documents from the 
premises, they had breached the Fourth Amendment, the provision of the 
United States Constitution that protects against unreasonable searches 
and seizures. The Government then sought a subpoena which would 
authorize its officers to seize the documents which they had earlier seized 
illegally. Holmes, writing for the Supreme Court, said that “the knowl- 
edge gained by the Government’s own wrong cannot be used by it in 
the way proposed.” In result, the Government would not be allowed 
to use the documents’; it would not be allowed to “avail itself of the 
knowledge obtained by that means.”*® Obviously, no judge could efface 
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the knowledge actually gained and thus lodged in the minds of the gov- 
ernment officers concerned. The solution was instead to place a limit on 
what those officers were permitted to do with the knowledge: they were 
forbidden from using it to evade the original exclusion. 

The “fruit of the poisonous tree,” as the principle of evidence applied 
in Silverthorne Lumber Co. came to be known, is invoked in connection 
with a range of evidentiary problems. Its distinctiveness is in its applica- 
tion to “secondary” or “derivative” evidence?—i.e., evidence such as that 
obtained by the Government in Silverthorne Lumber on the basis of evi- 
dence that had earlier been excluded. Silverthorne and, later, Nardone v. 
United States, where Holmes’s friend Felix Frankfurter gave the principle 
its well-known name, concerned a difficult question of causation. This is 
the question, a recurring one in criminal law, whether a concededly ille- 
gal search and seizure was really the basis of the knowledge that led to 
the acquisition of new evidence that the defendant now seeks to exclude. 
In the second Nardone case, Justice Frankfurter writing for the Court 
reasoned that the connection between the earlier illegal act and the new 
evidence “may have become so attenuated as to dissipate the taint.”!° 
But if the connection is close enough, if “a substantial portion of the case 
against him was a fruit of the poisonous tree,”!! then the defendant, as 
of right, is not to be made to answer in court for that evidence.!? That 
evidence, if linked closely enough to the original bad evidence, is bad 
itself. 

We have noted three strategies for dealing with bad evidence: one of 
these is to cut out the bad evidence and so prevent it from entering the 
decision process in the first place. This strategy, which we will call data 
pruning,'* in a judicial setting is to rule certain evidence inadmissible. 
It is a complete answer, when you have an illegal search and seizure, to 
the question of what to do with the evidence the police gained from 
that search. You don’t let it in. A different strategy is called for, how- 
ever, if bad evidence already has entered some phase of a decision pro- 
cess. Judges are usually concerned here with the jury’s process of fact- 
finding. On close reading, one sees that Holmes in Silverthorne Lumber 
was concerned with the law enforcement officers’ process of investigation. 
In regard to either process, and various others, a strategy is called for that 
restrains the inferences one draws from bad evidence. We will call that 
strategy inferential restraint. Finally, and further down the chain, where 
a decision or other output might be turned into practical action in the 
world at large, a further sort of restraint comes into play: restraint upon 
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action. We will call this variant of restraint executional restraint. Data 
pruning and the two variants of restraint, all familiar since Holmes’s day 
in American court rooms, have surfaced as possible strategies to address 
the problems that arise with data in machine learning. We will suggest, 
given the way machine learning works, that data pruning and strategies 
of restraint are not equally suited to address those problems. 


8.2 DATA PRUNING 


Excluding bad evidence from a decision process has at least two aims. 
For one, it has the aim of deterring impermissible practices by those who 
gather evidence, in particular officials with police powers. Courts exclude 
evidence “to compel respect for the constitutional guaranty [i.e., against 
warrantless search and seizure] in the only effectively available way—by 
removing the incentive to disregard it.”!+ For another, it has the aim 
of preventing evidence from influencing a decision, if the evidence tends 
to produce unfair prejudice against the party subject to the decision. In 
machine learning, the first of these aims—deterring impermissible data- 
gathering practices—is not absent. It is present in regulations on data 
protection.! Our main focus here is with the second aim: preventing 
certain data from influencing the decision.!° Data pruning is the main 
approach to achieving that aim in judicial settings. 

Data pruning avoids thorny questions of logic, in particular the prob- 
lem of attenuated causation. Just what inferences did the jury draw from 
the improper statements or evidence? Just what inferences did the police 
draw from the evidence gained from the unlawful search? And how did 
any such inferences affect future conduct (meaning future decision)? It is 
better not to have to ask those questions. This is a salient advantage of 
data pruning. It obviates asking, as Justice Frankfurter had to, whether 
the link between the bad evidence and the challenged evidence has “be- 
come so attenuated as to dissipate the taint.” 17 

Data pruning has the related advantage that, if the bad data is cut away 
before the decision-maker learns of it, the decision-maker does not have 
to try not thinking about something that she already knows. Data prun- 
ing avoids the problem that knowledge gained cannot be unlearnt. As 
courts have observed, one “cannot unring a bell.”!® The cognitive prob- 
lem involved here is also sometimes signaled with the command, “Try 
not to think of an elephant.” By deftly handling evidentiary motions, or 
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where needed by disciplining trial counsel,!? the judge cuts out the ele- 
phant before anybody has a chance to ask the jury not to think about 
it. 

Machine learning has a fundamental difficulty with data pruning. To 
make a meaningful difference on the learnt parameters, and thus on the 
eventual outputs when it comes time to execute, you need to strike out 
huge amounts of data. And, if you do that, you no longer have what 
you need to train the machine. Machines are bad at learning from small 
amounts of data; nobody has figured out how to get a machine to learn 
as a human infant can from a single experience. Nor has anybody, at least 
yet, found a way to take a scalpel to datasets; there’s no way, in the state of 
the art, to excise “bad” data reliably for purposes of training a machine.?° 
Accordingly, data pruning is anathema to computer scientists.7! 

As for legal proceedings, data pruning is, as we said, a complete answer 
to the problem it addresses—in situations in which the data was pruned 
before a decision-maker sees it. As we noted, however, not all improper 
evidence stays out of the court room. Nor does all knowledge gained from 
improper evidence—fruit of poisonous trees—stay out. Once it enters, 
which is to say once a decision-maker, such as a juror, has learned it, its 
potential for mischief is there. You cannot undo facts. They exist. Expe- 
rience is a fact. Things that have been experienced, knowledge that has 
been gained, do not disappear by fiat. 

A formalist would posit that the only facts that affect the trial process 
are those that the filters of evidentiary exclusion are designed to let in. As 
we have discussed, however, Holmes understood the law, including the 
results of trials, to derive from considerably more diverse material. Juries, 
lawyers, and judges all come with their experiences and their prejudices. 
To Holmes, these were a given, which is why he thought trying to compel 
decision-makers “to testify to the operations of their minds in doing the 
work entrusted to them” was an “anomalous course” and fruitless.” You 
cannot simply excise the unwanted experience from someone’s mind— 
any more than present-day computer scientists have succeeded in cutting 
the “bad” data from the training dataset. 


8.3 INFERENTIAL RESTRAINT 


What you can do—however imperfect a strategy it may be—is place limits 
on what you allow yourself, the jury, the machine, or the judge to infer 
from the data or the experience. Inferential restraint is familiar in both 
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law and machine learning. In efforts to address the problem of bad evi- 
dence (bad data) in machine learning, most of the energy indeed has been 
directed toward this approach: instead of pruning the data, computer sci- 
entists are developing methods to restrict the type of inferential outputs 
that the machine is able to generate.?* 

In the legal setting, placing restrictions upon inferences has been an 
important strategy for a long time. Judges’ instructions to juries serve 
that purpose; appeals courts recognize that judges’ instructions, prop- 
erly given, have curative effect.?* Judges, in giving curative instructions, 
understand that, even when bad evidence of the kind addressed in Stl- 
verthorne and Nardone (evidence seized in violation of a constitutional 
right) has been stopped before it gets to the jury, there still might be 
knowledge in the jurors’ minds that could exercise impermissible effects 
on their decision. The jurors might have gained such knowledge from a 
flip word in a lawyer’s closing argument.?° They might have brought it 
with them in off the street in the form of their life experiences; Holmes 
understood juries to have a predilection for doing just that.7° Knowledge 
exists which is to be kept from affecting verdicts, if those verdicts are 
to be accepted as sound. But some knowledge comes to light too late 
to prune. There, instead, a cure is to be applied. In the courtroom, the 
cure takes the form of an instruction from the judge. The instruction tells 
the jurors to restrain the inferences they draw from certain evidence they 
have heard. The restraint is intended to operate in the mental machinery 
of each juror. 

A further situation that calls for inferential restraint is that in which 
some piece of evidence has probative value and may be used for a per- 
missible purpose, but a risk exists that a decision-maker might use the 
evidence for an impermissible purpose. Pruning the evidence would have 
a cost: it would entail losing the probative value. Thus, as judges tell jurors 
to ignore certain experiences that they bring to the court room and cer- 
tain bad evidence or statements that, despite best efforts, have entered 
the court room, so do judges guide jurors in the use of knowledge that 
the court deliberately keeps.” Here, too, analogous approaches are being 
explored in machine learning.?® 


8.4 EXECUTIONAL RESTRAINT 


From Holmes’s judgment in Silverthorne, one discerns that a strategy of 
restraint operates not just on the mental processes of the people involved 
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at a given time but also on their future conduct and decisions. Silverthorne 
was a statement to the government about how it was to use knowledge. 
True, the immediate concern was to cut out the bad evidence root and 
branch, to keep it from undermining judicial procedure and breaching a 
party’s constitutional rights. Data pruning is what generations of readers 
of Silverthorne understand it to have done; the principle of the fruit of 
the poisonous tree more widely indeed is read as a call for getting rid of 
problematic inputs.?? 

There is more to the principle of the fruit of the poisonous tree, how- 
ever, than data pruning. Consider closely what Holmes said in Silver- 
thorne: “the knowledge gained by the Government’s own wrong cannot 
be used by it in the way proposed” (emphasis added). So the “Govern- 
ment’s own wrong” already had led it to gain certain knowledge. Holmes 
was not proposing the impossible operation of cutting that knowledge 
from the government’s mind. The time for pruning had come and gone. 
Holmes was proposing, instead, to restrain the Government from execut- 
ing future actions that the Government on the basis of that knowledge 
might otherwise have executed: knowledge gained by the Government’s 
wrong was not to be “used by it.” The poisonous tree (to use Frank- 
furter’s expression) addresses a state of the world after the bad evidence 
has already generated knowledge. The effect of that knowledge on future 
conduct is what is to be limited. That is to say, executional restraint, the 
strategy of restricting what action it is permissible to execute, inheres in 
the principle.*° 


8.5 POISONOUS PASTS AND FUTURE GROWTH 


Seen in this, its full sense, the principle of the fruit of the poisonous tree 
has high salience for machine learning, in particular as people seek to use 
machine learning to achieve outcomes society desires. A training dataset 
necessarily reflects a past state of affairs.*! The future will be different. 
Indeed, in many ways, we desire the future to be different, and we work 
toward making it so in particular, desirable ways. But change, as such, 
doesn’t require our intervention. Even if we separate ourselves from our 
desires for the future, from values that we wish to see reflected in the 
society of tomorrow, it is a matter of empirical observation, a fact, that 
the future will be different. Thus, either way, whether or not our values 
enter into it, we err if we rely blindly on a mechanism whose outputs are 
a faithful reflection of the inputs from the past that shaped it. We must 
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therefore restrain the conclusions that we draw from those outputs, and 
the actions we take, or else we will be getting the future wrong. 

In machine learning, there is widespread concern about undesirable 
correlations. An example could be supplied by a machine that hands out 
prison sentences. The machine is based on data. The data is a given. 
Americans of African ancestry have received a disproportionate number 
of prison sentences. Trained on that data, a machine will give reliable 
results: it will give results that reliably install the past state of affairs onto 
its future outputs. African-Americans will keep getting a disproportionate 
number of prison sentences. Reliability here has no moral valence in itself; 
it connotes no right or wrong. It is simply a property of the machine. 
The reason society objects to reliability of this kind, when considering an 
example as obvious as the prison sentencing machine, is that this relia- 
bility owes to data collected under conditions that society hopes will not 
pertain in the future. We want to live under new conditions. We do not 
want a machine that perpetuates the correlations found in that data and 
thus perpetuates (if we obey the machine) the old conditions. Some com- 
puter scientists think there may be ways to address this concern about 
undesirable correlations by pruning the training dataset.*? We mentioned 
the technical challenges this presents for machine learning. We speculate 
that the other strategies will be as important in machine learning as they 
have been in law: restrain the inferences and actions that derogate the 
values we wish to protect. That’s how we increase the chances that we’ll 
get the future right. 


NOTES 


l. See in particular Chapters 6 and 7, pp. 67 ff, 81 ff 
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“used derivatively”: Nardone et al. v. United States, 308 U.S. 338, 341 
(1939, Frankfurter, J.). Cf. noting that “[t]he exclusionary prohibition 
extends as well to the indirect as the direct products of such invasions [of 
a premises in breach of constitutional right]”: Wong Sun v. United States, 
371 U.S. 471, 484 (1963) (Brennan, J.). See further Brown (Gen. Ed.), 
MCCORMICK ON EVIDENCE (2006) § 176 pp. 292-94. 

308 U.S. at 342. 

Id. at 341. As to the sufficiency of connection, see Kerr, Good Faith, 
New Law, and the Scope of the Exclusionary Rule, 99 Gro. L. J. 1077, 
1099-1100 (2011). Cf. Devon W. Carbado, From Stopping Black People 
to Killing Black People: The Fourth Amendment Pathways to Police Violence, 
105 Cat. L. Rev. 125, 133-35 (2016). 

Undesirable outcomes from a machine learning process are shot through 
with questions of causation—e.g., is it appropriate to hold accountable 
the computer scientist who engineered a machine learning system, when 
an undesirable outcome is traceable back to her conduct if at all then only 
by the most attenuated lines? Regarding the implications for tort law, see, 
e.g., Gifford, Technological Triggers to Tort Revolutions: Steam Locomotives, 
Autonomous Vehicles, and Accident Compensation, 11 J. Torr Law 71, 
143 (2018); Haertlein, An Alternative Liability System for Autonomous 
Aircraft, 31 AIR & SPACE L. 1, 21 (2018); Scherer, Regulating Artifi- 
cial Intelligence Systems: Risks, Challenges, Competences, and Strategies, 29 
Harv. J. L. TECH. 353, 363-366 (2016); Calo, Open Robotics, 70 MD. 
L. Rev. 571, 602 (2011). Writers have addressed causation problems as 
well in connection with international legal responsibility and autonomous 
weapons: see, e.g., Burri, International Law and Artificial Intelligence, 
60 GYIL 91, 101-103 (2017); Sassóli, Autonomous Weapons and Inter- 
national Humanitarian Law: Advantages, Open Technical Questions and 
legal Issues to Be Clarified, 90 INT’L L. STUD. 308, 329-330 (2014). 

In the computer science literature, the expression “data pruning” has been 
associated with cleaning noisy datasets in order to improve performance. 
See, e.g., Anelia Angelova, Yaser S. Abu-Mostafa & Pietro Perona, Prun- 
ing Training Sets for Learning of Object Categories. CVPR Conference 
(2005), San Diego, June 20-25, 2005: vol. 1 IEEE 494-501. 

Mapp v. Ohio, 367 U.S. 643, 656 (1961). 

See for example Meriani, Digital Platforms and the Spectrum of Data 
Protection in Competition Law Analysis, 38(2) EUR. COMPET. L. REv. 89, 
94-95 (2017); Quelle, Enhancing Compliance Under the General Data 
Protection Regulation: The Risky Upshot of Accountability- and Risked- 
Based Approach, 9 Eur. J. RisK REGUL. 502, 524-525 (2018). 

“Bad evidence” is thus of broadly two types. (i) Evidence may be bad 
because the manner of its collection is undesirable. That type of bad 
evidence might have raised no problem, if its collection had not been 
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tainted. (ii) The other type is bad, irrespective of how the evidence collec- 
tor behaved. It is bad, because it poses the risk of an invidious influence 
on the decision process itself. 

Doctrinal writers on evidence have struggled to articulate how to deter- 
mine whether the link between bad evidence and challenged evidence is 
attenuated enough to “dissipate the taint.” Clear enough is the existence 
of an exception to the fruit of the poisonous tree. Unclear is when the 
exception applies. Here the main treatise on American rules of evidence 
has a go at an answer: 


This exception... does not rest on the lack of an actual causal link 
between the original illegality and the obtaining of the challenged 
evidence. Rather, the exception is triggered by a demonstration that 
the nature of that causal link is such that the impact of the original 
illegality upon the obtaining of the evidence is sufficiently minimal 
that exclusion is not required despite the causal link. Brown (Gen. 
Ed.), MCCORMICK ON EVIDENCE (2006) § 179 p. 297. 


Note the circularity: the exclusion “exception is triggered” (i.e., the exclu- 
sion is not required) when the “exclusion is not required.” The hard ques- 
tion is what precisely are the characteristics that give a causal link such a 
“nature” that it is “sufficiently minimal.” 

Dunn v. United States, 307 F.2d 883, 886 (Gewin, J., 5th Cir., 1962). 
Courts outside the U.S. have used the phrase too: Kung v. Peak Potentials 
Training Inc., 2009 BCHRT 154, 2009 CarswellBC 1147 para 11 (British 
Columbia Human Rights Tribunal, Apr. 23, 2009). 

See for example Fuery et al. v. City of Chicago, 900 F. 3d 450, 457 
(Rovner, J., 7th Cir., 2018). 

Broadly speaking, there are two ways to prune a dataset: removing items 
from the dataset (rows) for example to remedy problems of unbalanced 
representation, or removing a sensitive attribute from the dataset (a col- 
umn). It has been widely observed that removing a sensitive attribute is 
no use, if that attribute may be more or less reliably predicted from the 
remaining attributes. Removing items is also tricky: for example, the cura- 
tors of the ImageNet dataset, originally published in 2009 (see Prologue, 
p. xiii, n. 23) were as of 2020 still playing whack-a-mole to remedy issues 
of fairness and representation. See Yang et al. (2019). 

Of course we don’t mean that computer scientists find the goal that moti- 
vates data pruning efforts to be antithetical morally or ethically. Instead, 
data pruning, a blunt instrument perhaps acceptable as a stop-gap, is at 
odds with how machine learning works. 

Coulter et al. v. Louisville & Nashville Railroad Company, 25 S.Ct. at 
345, 196 U.S. at 610 (1905). 
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For a cutting-edge illustration, see Madras et al. (2018). What is partic- 
ularly interesting about their approach is that, in order to guarantee that 
the machine learning system’s inferences are unbiased against individuals 
with some protected attribute x, that attribute must be available to the 
machine. This illuminates why computer scientists are uneasy about data 
pruning. 

See Leonard (ed.), NEw WIGMORE (2010) § 1.11.5 p. 95 and see id. 
95-96 n. 57 for judicial comment. The standard for establishing that lim- 
iting instructions have failed is exacting. See, e.g., Encana Oil & Gas 
(USA) Inc. v. Zaremba Family Farms, Inc. et al., 736 Fed. Appx. 557, 
568 (Thapar, J., 6th Cir., 2018). 

A problem addressed repeatedly by U.S. courts. See, e.g., Dunn v. United 
States, 307 F.2d 883, 885-86 (Gewin, J., 5th Cir. 1962); McWhorter 
v. Birmingham, 906 F.2d 674, 677 (Per Curiam, llth Cir. 1990). A 
substantial literature addresses jury instructions, including from empirical 
angles. See, e.g., Mehta Sood, Applying Empirical Psychology to Inform 
Courtroom Adjudication—Potential Contributions and Challenges, 130 
Harv. L. Rev. F. 301 (2017). 

See Chapter 7, p. 81 ff. See also Liska, Experts in the Jury Room: When 
Personal Experience is Extraneous Information, 69 STAN. L. Rev. 911 
(2017). 

Appeals courts consider such instructions frequently. For a recent example, 
see United States v. Valois, slip. Op. pp. 13-14 (Hull, J., 2019, 11th 
Cir.). Cf. Namet v. U.S., 373 U.S. 179, 190, 83 S.Ct 1151, 1156 n. 10 
(Stewart, J., 1963). 

See Madras et al., op. cit. 

That reading is seen in judgments, including (perhaps particularly) of for- 
eign courts when they observe that “fruit of the poisonous tree” is not 
part of their law. See, e.g., Z. (Z.) v. Shafro, 2016 ONSC 6412, 2016 
CarswellOnt 16284, para 35 (Kristjanson, J., Ontario Superior Court of 
Justice, Oct. 14, 2016). Some foreign courts do treat the doctrine as part 
of their law and apply a similar reading. See, e.g., Dela Cruz v. People 
of the Philippines (2016) PHSC 182 (Leonnen, J., Philippines Supreme 
Court, 2016), with precedents cited at Section III, n. 105. See the com- 
parative law treatment of the principle by Thaman, “Fruits of the Poisonous 
Tree” in Comparative Law, 16 Sw. J. INT’L L. 333 (2010). 

Executional restraint and inferential restraint, as we stipulate the concepts, 
in some instances overlap, because an execution that is to be restrained 
might be a mental or computational process of inference. Overlap is 
detectable in Silverthorne Lumber. The Government, Holmes said, was to 
be restrained from how it used the knowledge that it had gained through 
an illegal search and seizure. The use from which Holmes called the Gov- 
ernment to be restrained was equally the Government’s reasoning about 
where to go in search of evidence; and the physical action it executes in the 
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field. Restraint has both aspects as well where one is concerned, instead of 
with preventing people from using knowledge to generate more knowl- 
edge, with preventing a machine learning system from using an input to 
generate more outputs. The overlap arises in machine learning between 
the two variants of restraint, because machine learning systems (at least 
in the current state of the art) don’t carrying on computing with new 
inputs unless some action is taken to get them to execute. The execu- 
tional restraint would be to refrain from switching on the machine (or, if 
its default position is “on,” then to switch the machine off). 

The overlap is also significant where human institutions function under 
procedures that control who gets what information and for what purposes. 
Let us assume that there is an institution that generates decisions with a 
corporate identity—i.e., decisions that are attributable to the institution, 
rather than to any one human being belonging to it. Corporations and 
governments are like that. Let us also assume that, in order to generate a 
decision that bears the corporate identity, two or more human beings must 
handle certain information; and one of them, or some third person, has 
the power to withhold that information. The person having the withhold- 
ing power may place a restraint upon the institution: she may withhold 
the information and, thus, the institution cannot carry out the decision 
process. The restraint in this setting has overlapping characteristics. It is 
inferential, in that it restrains the decision process; it is executional, in that 
it restrains the actions of the individual constituents of the institution. 
See Chapter 3, p. 37. 

Chouldechova & Roth, op. cit., Section 3.4 p. 7. Cf. Paul Teich, Artificial 
Intelligence Can Reinforce Bias, FORBES (Sept. 24, 2018) (referring to 
experts who “say AI fairness is a dataset issue”). 
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CHAPTER 9 


From Holmes to AlphaGo 


Holmes in The Path of the Law asked “What constitutes the law?” and 
answered that law is nothing more than prophecies of what the courts will 
do. As we discussed in Chapter 5, this is not just the trivial observation 
that one of the jobs of a lawyer is to predict the outcome of a client’s 
case: it is the insight that growth and development of the law itself—the 
path of the law—is constituted through predictive acts. 

Holmes was preoccupied throughout his legal career with understand- 
ing the law as an evolving system. Kellogg, in a recent study of the roots 
of Holmes’s thinking,! traces this interest to the period 1866-1870, 
Holmes’s first years as a practicing lawyer, and to his reading of John 
Stewart Mill on the philosophy of induction, and of William Whewell 
and John Herschel on the role of induction in scientific theory-building. 
Holmes’s original insight was that the development of law is a process of 
social induction: it is not simply logical deduction from axioms laid down 
in statutes and doctrine as formalists would have it; it is not simply the 
totality of what judges have done as the realists would have it. Instead, 
law develops through agents embedded in society who take actions that 
depend on and contribute to the accumulating body of experience, and 
it involves social agents who through debate are able to converge toward 
entrenched legal doctrine. 

In the standard paradigm for machine learning, there is no counterpart 
to the first part of Holmes’s insight of social induction—i.e., to the role 
of active agents embedded in society. The standard paradigm is that there 
is something for the machine to learn, and this “something” is data, i.e. 
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given, and data does not accumulate through ongoing actions. This is why 
the field is called “machine learning” rather than “machine doing”! Even 
systems in which a learning agent’s actions affect its surroundings, for 
example a self-driving car whose movements will make other road-users 
react, the premise is that there are learnable patterns about how others 
behave, and learning those patterns is the goal of training, and training 
should happen in the factory rather than on the street. 

There is however a subfield of machine learning, called reinforcement 
learning, in which the active accumulation of data plays a major role. 
“AlphaGo,”” the AI created by DeepMind which in 2016 won a historic 
victory against top-ranking (human) Go player Lee Seedol, is a product of 
reinforcement learning. In this chapter we will describe the links between 
reinforcement learning and Holmes’s insight that law develops through 
the actions of agents embedded in society. 

The second part of Holmes’s insight concerns the process whereby 
data turns into doctrine, the “continuum of inquiry.”* As case law accu- 
mulates, there emerge clusters of similar cases, and legal scholars, exam- 
ining these clusters, hypothesize general principles. Holmes famously said 
that “general propositions do not decide concrete cases,” but he also saw 
law as the repository of the “ideals of society [that] have been strong 
enough to reach that final form of expression.” In other words, legal 
doctrine is like an accepted scientific theory*: it provides a coherent nar- 
rative, and its authority comes not from prescriptive axioms but rather 
from its ability to explain empirical data. Well-settled legal doctrine arises 
through a social process: it “embodies the work of many minds, and has 
been tested in form as well as substance by trained critics whose practical 
interest is to resist it at every step.”° 

There is nothing in machine learning that corresponds to this sec- 
ond aspect of Holmes’s social induction, to the social dialectic whereby 
explanations are generated and contested and some explanation eventu- 
ally becomes entrenched. In the last part of this chapter we will discuss 
the role of legal explanation, and outline some problems with explainabil- 
ity in machine learning, and suggest how machine learning might learn 
from Holmes. 


9.1 ACCUMULATING EXPERIENCE 


According to Holmes, “The growth of the law is very apt to take place in 
this way: two widely different cases suggest a general distinction, which 
is a clear one when stated broadly. But as new cases cluster around the 
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opposite poles, and begin to approach each other [...] at last a mathe- 
matical line is arrived at by the contact of contrary decisions.”° 


+ 
+ + 
O 
+4 
O 


Holmes’s metaphor, of a mathematical line drawn between cases with 
contrary decisions, will be very familiar to students of machine learning, 
since almost any introductory textbook describes machine-learning clas- 
sification using illustrations such as the figure above. In the figure, each 
datapoint is assigned a mark according to its ground-truth label,” and 
the goal of training a classifier is to discover a dividing line. DeepMind’s 
AlphaGo can be seen as a classifier: it is a system for classifying game- 
board states according to which move will give the player the highest 
chance of winning. During training, the system is shown many game- 
board states, each annotated according to which player eventually wins 
the game, and the goal of training is to learn dividing lines. 

Holmes was interested not just in the dividing lines but in the accumu- 
lation of new cases. Some new cases are just replays with variations in the 
facts, Cain killing Abel again and again through history. But Holmes had 
in mind new cases arising from novel situations, where legal doctrine has 
not yet drawn a clear line. The law grows through a succession of partic- 
ular legal disputes, and in no situation would there be a meaningful legal 
dispute if the dividing line were clear. Actors in the legal system adapt 
their actions based on the body of legal decisions that has accumulated, 
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and this adaptation thus affects which new disputes arise. New disputes 
will continue to arise to fill out the space of possible cases, until eventu- 
ally it becomes possible to draw a line “at the contact of contrary deci- 
sions.” Kellogg summarizes Holmes’s thinking thus: “he reconceived log- 
ical induction as a social process, a form of inference that engages adaptive 
action and implies social transformation.”® 

Machine learning also has an equivalent of adaptive action. The train- 
ing dataset for AlphaGo was not given a priori: it was generated during 
training, by the machine playing against itself. To be precise, AlphaGo was 
trained in three phases. The first phase was traditional machine learning, 
from an a priori dataset of 29.4 million positions from 160,000 games 
played by human professionals. In the second phase, the machine was 
refined by playing against an accumulating library of earlier iterations of 
itself, each play adding a new iteration to the library. The final iteration of 
the second-phase machine was played against itself to create a new dataset 
of 30 million matches, and in the third phase this dataset was used as 
training data for a classifier (that is to say, the machine in the third phase 
trains on a given dataset, which, like the given dataset in the first phase, 
is not augmented during training). The trained classifier was the basis for 
the final AlphaGo system. DeepMind later created an improved version, 
AlphaGo Zero,’ which essentially only needed the second phase of train- 
ing, and which outperformed AlphaGo. The key feature of reinforcement 
learning, seen in both versions, is that the machine is made to take actions 
during training, based on what it has learnt so far, and the outcomes of 
these actions are used to train it further—Kellogg’s “adaptive action.” 

Holmes says that the mathematical line is arrived at “by the contact 
of contrary decisions.” Similarly, AlphaGo needed to be shown sufficient 
diversity of game-board states to fill out the map, so that it can learn to 
classify any state that it might plausibly come across during play. In law 
the new cases arise through fractiousness and conflict—“man’s destiny is 
to fight”!°—whereas for AlphaGo the map was filled out by artificially 
adding noise to the game-play dataset. 

Holmes has been criticized for putting forwards a value-free model of 
the law—he famously defined truth “as the majority vote of that nation 
that can lick all the others.”!! Kellogg absolves Holmes of this charge: he 
argues that Holmes saw law as a process of social inquiry, using the mech- 
anism of legal disputes to figure out how society works, similar to how 
science uses experiments to figure out how nature works. The dividing 
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lines that the law draws are therefore not arbitrary: “Any successful con- 
clusions of social inquiry must, in an important respect, conform with the 
world at large. Social inductivism does not imply that the procedures and 
ends of justification are relativist products of differing conventions.” 1? 
Likewise, even though the training of AlphaGo is superficially relativist (it 
was trained to classify game-board states by the best next move, assuming 
that its opponent is AlphaGo), it is nonetheless validated by objective game 
mechanics: pitted against Lee Seedol, one of the top human Go players 
in the world, AlphaGo won. 


9.2 LEGAL EXPLANATIONS, 
DECISIONS, AND PREDICTIONS 


“Tt is the merit of the common law,” Holmes wrote, “that it decides the 
case first and determines the principle afterwards.”!* Machine learning 
has excelled (and outdone the ingenuity of human engineers) at mak- 
ing decisions, once decision-making is recast as a prediction problem as 
described in Chapter 5. This success, however, has come at the expense 
of explainability. Can we learn how to explain machine learning decisions, 
by studying how common law is able to determine the principle behind a 
legal decision? 

In the law, there is a surfeit of explanation. Holmes disentangled three 
types: (i) the realist explanation of why a judge came to a particular deci- 
sion, e.g. because of an inarticulate major premise, (ii) the formalist expla- 
nation that the judge articulates in the decision, and (iii) explanation in 
terms of principles. Once principles are entrenched then the three types 
of explanation will tend to coincide, but in the early stages of the law they 
often do not. Principles reflect settled legal doctrine that “embodies the 
work of many minds and has been tested in form as well as substance by 
trained critics whose practical interest is to resist it at every step.” They 
arise through a process of social induction, driven forwards not just by 
new cases (data) but also by contested explanations. 

To understand where principles come from, we therefore turn to judi- 
cial decisions. (In legal terminology, decision is used loosely!* to refer 
both to the judgement and to the judge’s explanation of the judgement.) 

Here is a simple thought experiment. Consider two judges A and B. 
Judge A writes decisions that are models of clear legal reasoning. She takes 
tangled cases, cases so thorny that hardly any lawyer can predict the out- 
come, and she is so wise and articulate that her judgments become widely 
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relied upon by other judges. Judge B on the other hand writes garbled 
decisions. Eventually a canny lawyer realizes that this judge finds in favor 
of the defendant after lunch, and in favor of the plaintiff at other times of 
day (her full stomach is the inarticulate major premise). Judge B is very 
predictable, but her judgments are rarely cited and often overturned on 
appeal. 

If we think of law purely as a task of predicting the outcome of 
the next case, then judgments by A and by B are equivalent: they are 
grist for the learning mill, data to be mined. For this task, the quality 
of their reasoning is irrelevant. It is only when we look at the develop- 
ment of the legal system that reasoning becomes significant. Judge A has 
more impact on future cases, because of her clear explanations. “[T]he 
epoch-making ideas,” Holmes wrote, “have come not from the poets but 
from the philosophers, the jurists, the mathematicians, the physicists, the 
doctors—from the men who explain, not from the men who feel.” !® 

Our simple thought experiment might seem to suggest that it is rea- 
soning, not prediction, that matters for the growth of the law. What then 
of Holmes’s famous aphorism, that prophecy is what constitutes the law? 
Alex Kozinski, a U.S. Court of Appeals judge who thought the whole 
idea of inarticulate major premise was overblown, described how judges 
write their decisions in anticipation of review: 


If you’re a district judge, your decisions are subject to review by three 
judges of the court of appeals. If you are a circuit judge, you have to per- 
suade at least one other colleague, preferably two, to join your opinion. 
Even then, litigants petition for rehearing and en banc review with annoy- 
ing regularity. Your shortcuts, errors and oversights are mercilessly paraded 
before the entire court and, often enough, someone will call for an en banc 
vote. If you survive that, judges who strongly disagree with your approach 
will file a dissent from the denial of en banc rehearing. If powerful enough, 
or if joined by enough judges, it will make your opinion subject to close 
scrutiny by the Supreme Court, vastly increasing the chances that certiorari 
will be granted. Even Supreme Court Justices are subject to the constraints 
of colleagues and the judgments of a later Court. 16 


Thus judges, when they come to write a decision, are predicting how 
future judges (and academics, and agents of public power, and public 
opinion) will respond to their decisions. Kozinski thus brings us back to 
prophecy and demonstrates the link with explanations “tested in form as 
well as substance by trained critics.” 
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9.3 GÖDEL, TURING, AND HOLMES 


We have argued that the decision given by a judge is written in anticipa- 
tion of how it will be read and acted upon by future judges. The better 
the judge’s ability to predict, the more likely it is that this explanation will 
become part of settled legal doctrine. Thus judges play a double role in 
the growth of the law: they are actors who make predictions; and they are 
objects of prediction by other judges. 

There is nothing in machine learning that is analogous, no system in 
which the machine is a predictor that anticipates future predictors. This 
self-referential property does however have an interesting link to classic 
algorithmic computer science. Alan Turing is well known in popular cul- 
ture for his test for artificial intelligence.” Among computer scientists he 
is better known for inventing the Turing Machine, an abstract mathemat- 
ical model of a computer that can be used to reason about the nature and 
limits of computation. He used this model to prove in 1936!® that there 
is a task that is impossible to solve on any computer: the task of deciding 
whether a given algorithm will eventually terminate or whether it will get 
stuck in an infinite loop. This task is called the “Halting Problem.” A key 
step in Turing’s proof was to take an algorithm, i.e. a set of instructions 
that tell a computer what to do, and represent it as a string of symbols 
that can be treated as data and fed as input into another algorithm. Tur- 
ing here was drawing on the work of Kurt Friedrich Gédel, who in 1930 
developed the equivalent tool for reasoning about statements in math- 
ematical logic. In this way, Gödel and later Turing were able to prove 
fundamental results about the limits of logic and of algorithms. They ana- 
lyzed mathematics and computation as self-referential systems. 

In Turing’s work, an algorithm is seen as a set of instructions for pro- 
cessing data, and, simultaneously, as data which can itself be processed. 
Likewise, in the law, the judge is an agent who makes predictions, and, 
simultaneously, an object for prediction. Through these predictions, set- 
tled legal principles emerge; in this sense the law can be said to be consti- 
tuted by prediction. Machine learning is also built upon prediction—but 
machine learning is not constituted by prediction in the way that law is. 
We might say that law is post-Turing while machine learning is still pre- 
Turing.!? 
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94 WHAT MACHINE LEARNING CAN 
LEARN FROM HOLMES AND TURING 


Our point in discussing legal explanation and self-referential systems is 
this: 


(i) social induction in the law is able to produce settled legal prin- 
ciples, i.e. generally accepted explanations of judicial decision- 
making; 

(ii) the engine for social induction in the law is prediction in a self- 
referential system; 

(iii) machine learning has excelled (and outdone human engineering 
ingenuity) at predictive tasks for which there is an empirical mea- 
sure of success; 

(iv) if we can combine self-reference with a quantitative predictive task, 
we might get explainable machine learning decisions. 


In the legal system, the quality of a decision can be evaluated by measur- 
ing how much it is relied on in future cases, and this quality is intrinsically 
linked to explanations. Explanations are evaluated not by “are you happy 
with what you’ve been told?”, but by empirical consequences. Perhaps 
this idea can be transposed to machine learning, in particular to rein- 
forcement learning problems, to provide a metric for the quality of a pre- 
diction. This would give an empirical measure of success, so that the tools 
that power machine learning can be unleashed, and “explainability” will 
become a technical challenge rather than a vague and disputed laundry 
list. Perhaps, as in law, the highest quality machine learning systems will 
be those that can internalize the behavior of other machines. Machines 
that do that would all the more trace a path like that of Holmes’s law. 

These are speculative directions for future machine learning research, 
which may or may not bear fruit. Nonetheless, it is fascinating that 
Holmes’s understanding of the law suggests such avenues for research 
in machine learning. 
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CHAPTER 10 


Conclusion 


In every department of knowledge, what wonderful things have been done to solicit 
the interest and to stir the hopes of new inquirers! Every thing is interesting when 
you understand it, when you see its connection with other things, and, in the end, 
with all other things. 


Oliver Wendell Holmes, Jr., Remarks at the Harvard Commencement 
Dinner (June 22, 1880)! 


This book has posited a connection between two seemingly remote 
things: between a late nineteenth century revolution in law and an early 
twenty-first century revolution in computing. The jurist on whose work 
we’ve drawn, if he’d been transported to the present day, we think would 
have been open to the connection. Holmes’s early milieu had been one 
of science, medicine, and letters, these being fields in which his father 
had held a prominent place and in which their city, in Holmes’s youth, 
in America had held the preeminent place. Leading lights of nineteenth 
century philosophy and science numbered among Holmes’s friends and 
interlocutors at home and abroad in the years immediately after the Civil 
War. Holmes continued throughout his life to engage with people whom 
today we would call technologists. His interest in statistics and in the nat- 
ural sciences was broad and deep and visible in Holmes’s vast output as 
a scholar and a judge. Lawyering and judging, to Holmes, were jobs but 
also objects to be searched for deeper understanding. 
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We hope that in the preceding chapters, by considering “its connec- 
tion with other things,” we have contributed to a deeper understand- 
ing of where machine learning belongs in the wider currents of modern 
thought. The common current that has shaped both legal thought and 
computer science is probability. It is a strikingly modern concept. As we 
have recalled, its origins are not less recent than the mid-seventeenth cen- 
tury. Its impact has been felt in one field after another, though hardly all 
at once. Law was present at its origins, though it took over two centuries 
before a new jurisprudence would take shape under its influence. Com- 
puting, too, did not begin as an operation in probability and statistics, 
but now probability and statistics are the indispensable core of machine 
learning. Thus both law and computing have undergone a shift from their 
earlier grounding in deductive logic: they have taken an inductive turn, 
based on pattern finding and prediction. 

But, to conclude, let us turn away from intellectual history and look 
instead to the future. 


10.1 HOLMES AS FUTURIST 


Holmes, notwithstanding the strains of fatalism evident in his words, was 
fascinated by the potential for change, in particular change as driven by 
science and technology. Speaking in 1895 in honor of C. C. Langdell, 
that leading expositor of legal formalism, Holmes stated with moderate 
confidence that a march was on toward a scientific basis for law and that 
it would continue: “The Italians have begun to work upon the notion 
that the foundations of the law ought to be scientific, and, if our civiliza- 
tion does not collapse, I feel pretty sure that the regiment or division that 
follows us will carry that flag.” With the reference to “[t]he Italians” 
Holmes seems to have had in mind the positivism that was prevalent in 
legal theory in late nineteenth century Italy*; to the possibility of civiliza- 
tional collapse, the pessimism prevalent generally in European philosophy 
at the time. In 1897, in Law in Science and Science in Law, Holmes 
hedged his prediction, but he continued to see contemporary advances 
in science and technology as pertinent to the organization of public life 
in the widest sense: “Very likely it may be that with all the help that 
statistics and every modern appliance can bring us there never will be a 
commonwealth in which science is everywhere supreme.”° To entertain 
the possibility of a technological supremacy arising over law, even if to 
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doubt that a scientific revolution in law would ever be complete, was still 
to place the matter in high relief. 

Technological change had affected society at large for generations by 
the time Holmes wrote Law in Science and Science in Law. However, 
advances were accelerating and, moreover, in specific technical domains 
technology was interweaving itself with public order in unprecedented 
ways. This was the decade in which the U.S. Census Bureau first used a 
punch card machine with electric circuits to process data. The machine, 
known as the Hollerith Tabulator after its inventor, Herman Hollerith, 
was the forerunner of modern data processing. Hollerith’s company, the 
Tabulating Machine Company, was one of several later amalgamated to 
form the company that was eventually re-named IBM. The basic con- 
cept of the machine remained the cornerstone of data processing until 
the 1950s.” By 1911 (when Hollerith sold the Tabulating Machine 
Company), Hollerith Tabulators already had been used to process cen- 
sus data in the United States, United Kingdom, Norway, Denmark, 
Canada, and the Austrian and Russian Empires. Railroads, insurance com- 
panies, department stores, pharmaceutical companies, and manufacturers 
employed Hollerith machines as well. The Hollerith Tabulator lowered 
the cost of handling large quantities of data and accelerated the work; 
the SCIENTIFIC AMERICAN, which ran an article on the machine in its 
August 30, 1890 edition, attributed the “early completion of the [cen- 
sus] count... to the improved appliances by which it was executed.”? 

The Hollerith machines did more than increase the efficiency of 
the performance of existing tasks, however. Because they enabled users 
to interrogate datasets in ways that earlier were prohibitively time- 
consuming—for example, asking how many people in the year 1900 in 
Cincinnati were male blacksmiths born in Italy—the Hollerith machines 
opened the door to new uses for data, not just more efficient head counts. 
The SCIENTIFIC AMERICAN referred to the “elasticity of function” that 
the machines enabled.!° Hollerith himself was referred to as the first “sta- 
tistical engineer.” !! 

Holmes was not excited about putting his hands on the various inno- 
vations that technologists were bringing to the market; he doubted that 
his house would have had electricity or a telephone if his wife and not 
had them installed.!? It would be surprising, however, if Holmes had not 
known of the Hollerith machine.!* The edition of SCIENTIFIC AMERICAN 
containing the article about Hollerith and the census featured illustra- 
tions of the tabulator at work on its cover. The same periodical had 
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run an article four years earlier on Holmes’s father.‘ Holmes was a 
paid subscriber.!> He also encountered technological innovations in the 
course of his principal employment: Holmes authored a number of the 
Supreme Court’s decisions in patent matters!® (none, it seems, concern- 
ing Hollerith, though the “statistical engineer” was no stranger to intel- 
lectual property disputes!” ). Curiously enough, Hollerith’s headquarters 
and workshop were in a building in the Georgetown part of Washington, 
DC not many blocks from where Holmes lived after moving to the capi- 
tal,! and the building in which the Census employed a large array of the 
machines was a short block off the most direct route (2.2 miles) between 
the Capitol (which then housed the Supreme Court) and Holmes’s house 
at 1720 I Street, NW. Contemporaries remarked on the distinctive chimes 
that bells on the machines made, a noise which rose to a clamor in the 
building and which could be heard on the street below.!? Holmes was a 
keen rambler whose peregrinations in Boston, Washington, and elsewhere 
took him in pursuit of interesting things.2? Whether or not the Hollerith 
Tabulator was the appliance Holmes had in mind in Law in Science and 
Science in Law, technology was in the air. The emergence of modern 
bureaucracy in the early nineteenth century had been associated with an 
ambition to put public governance on a scientific basis;*! the emergence 
of machines in the late nineteenth century that process data inspired new 
confidence that such an ambition was achievable.?” To associate the com- 
monwealth and its governance with statistics and “modern appliance,” as 
Holmes did, was very much of a piece with the age. 

Holmes’s interest in technology induced him to maintain wide-ranging 
contacts, some of them rather idiosyncratic. A fringe figure named 
Franklin Ford?’ for a number of years corresponded with Holmes about 
the former’s theories regarding news media and credit institutions. Ford 
imagined a centralized clearing mechanism that would give universal 
access to all news and credit information, an idea today weirdly evoca- 
tive of the world wide web; and he said that this mechanism would sup- 
plant the state and its legal institutions, a prediction likewise evocative 
of futurists today who say, e.g., the blockchain will bring about the end 
of currencies issued under government fiat. Holmes continued the cor- 
respondence for years, telling Ford at one point that he (Ford) was “en- 
gaged with the large problems of the sociologist, by whom all social forces 
are equally to be considered and who, of course, may find and will find 
forces and necessities more potent than the theoretical omnipotence of 
the technical lawgiver.”?* Holmes evidently continued to speculate that 
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law might in time give way to the experience embodied in “all social 
forces” which, in the context of that correspondence, suggested “social 
forces” mediated in some way by technology. In his correspondence with 
Franklin Ford, whose schemes aimed at the dissemination and use of data, 
Holmes seemed to intuit that, if machines came to martial data in even 
“more potent” ways, civilization-changing effects might follow. 


Study of Justice and Mrs. Oliver Wendell Holmes’s Washington, DC residence (Harris & Ewing, 
Washington, DC, United States [photographer] 1935; Harvard Law School Library, Historical & Special 
Collections; original at Library of Congress, Prints & Photographs Division, Lot 10304) 
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In a flight of fancy, in a 1913 speech, Holmes went so far as to specu- 
late about the evolution of the species: 


I think it not improbable that man, like the grub that prepares a chamber 
for the winged thing it never has seen but is to be—that man may have 
cosmic destinies that he does not understand... I was walking homeward 
on Pennsylvania Avenue near the Treasury, and as I looked beyond Sher- 
man’s Statue to the west the sky was aflame with scarlet and crimson from 
the setting sun. But, like the note of downfall in Wagner’s opera, below 
the sky line there came from little globes the pallid discord of the electric 
lights. And I thought to myself the Gétterddmmerung will end, and from 
those globes clustered like evil eggs will come the new masters of the sky. 
It is like the time in which we live. But then I remembered the faith that I 
partly have expressed, faith in a universe not measured by our fears, a uni- 
verse that has thought and more than thought inside of it, and as I een 
after the sunset and above the electric lights there shone the stars.” 


Holmes in this passage holds his own with the most imaginative—and the 
most foreboding—twenty-first century transhumanists. The operatic ref- 
erence, with a little stretch, is even more evocative of change wrought by 
science than first appears. True, the characters in Wagner’s opera don’t use 
electric circuits for data processing.”° But it is not too foreign to Holmes’s 
speculations about the world-changing potential of statistics—or to the 
conceptual foundations of the machine learning age—that, in the Pro- 
logue to that last of the Ring Cycle operas, the Fates, whose vocation 
is to give prophecies, are weaving: and the rope with which they weave 
is made of the knowledge of all things past, present, and yet to come. 
The rope breaks, and thus the stage is set for the end of one world and 
the start of another.?” Mythological data scientists foretelling the epochal 
changes their science will soon effect! 

In less fanciful tenor, in Law in Science Holmes suggested that science 
might aid law and possibly replace it: 


I have had in mind an ultimate dependence upon science because it is 
finally for science to determine, so far as it can, the relative worth of our 
different social ends, and, as I have tried to hint, it is our estimate of the 
proportion between these, now often blind and unconscious, that leads us 
to insist upon and to enlarge the sphere of one [legal] principle and to 
allow another gradually to dwindle into atrophy. 
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Science, in Holmes’s view, would be put in harness to law; or it would 
replace law by taking over the social functions that law for the time being 
serves. Richard Posner, Chief Judge of the U.S. Court of Appeals for the 
Seventh Circuit at the time, on the 100th anniversary of The Path of the 
Law read Holmes to contemplate that law would be “succeeded at some 
time in the future by forms of social control that perform the essential 
functions of law but are not law in a recognizable sense.”?? The most 
accomplished scholar of Holmes to have served on an American court 
in the present century, Posner also thought the passage about Götter- 
diimmerung and “cosmic destinies” noteworthy.°° Whatever the precise 
role Holmes contemplated for science, and wherever he thought science 
would take us, it is evident that Holmes’s philosophy did not equate with 
narrow presentism. Holmes was keenly interested in the future, including 
the future impact of science on law. 

Writers have cautioned against “scientism,”*! the unjustified confi- 
dence in the potential for science to solve society’s problems. Our focus 
here has not been to repeat well-known critiques of unexamined enthu- 
siasm for technological change. The acknowledgement of correlation 
between new technologies and risk has tempered scientistic impulses**; 
admonitions have been sounded in regard to Holmes’s ideas about sci- 
ence.*? It nevertheless is timely to alert practitioners of computer science 
that they ignore sanguinary lessons of the history of ideas if they place 
blind faith in the power of their craft. Holmes, perhaps, can be read for 
cautionary notes in that regard. 

But our chief purpose in this book has been to use the analogy from 
Holmes’s jurisprudence to cast light on machine learning. Let us ask, 
then, what, if any, lessons for the future of computer science might be 
found in Holmes’s speculations about the future of law. 


10.2 WHERE Dip HOLMES THINK Law Was 
GOING, AND MIGHT COMPUTER SCIENCE FOLLOW? 


Holmes, in thinking about law, found interest in wider currents that law 
both is borne upon and drives. Holmes considered the possibility that 
science will replace law—more precisely, that scientific method and tech- 
nological advances will reveal rules and principles that law will adopt and 
thus give law a more reliable foundation. A curious irony would be if it 
went the other way around. A self-referential system—prediction as the 
system’s output and its input as well—which is to say Holmes’s concept 
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of law as he thought law actually is—is what computer scientists, we spec- 
ulate, might seek to make machine learning into. That has not been what 
machine learning is. True, machine learning has moved beyond logic and 
so is now an inductive process of finding patterns in input data to attain 
an output. So far however that is the end of the road. If machine learn- 
ing goes further, if it comes to embody self-referential mechanisms such as 
Gödel and Turing devised in mathematics and computation, then machine 
learning will come to look even more like Holmes’s law—an inductive 
system of prediction-making and self-referential prediction-shaping. The 
law, as Holmes understood it, would then have foreshadowed the future 
of computer science. 

This is not how Holmes seems to have imagined things would go. We 
discern in his futurist and scientistic vein that Holmes thought that law, 
as he understood it to be, would give way to something else. What he 
thought law as prediction would give way to is not clear, but, as Judge 
Posner suggested, Holmes seems to have contemplated that science and 
technology would end society’s reliance on law and bring about new 
mechanisms of control. The new mechanisms would be based on logic, 
rather than experience, and thus, in Holmes’s apparent vision, would 
come full circle back to a sort of formalism—not a formalism based on 
arbitrary doctrines and rules, but based, instead, on propositions derived 
from what nineteenth century thinkers conceived of as science. 

Holmes’s speculations about science replacing law would seem to have 
a genealogy back to Leibniz, though we are not aware to what extent, if 
at all, Holmes was thinking about that antecedent when he wrote about a 
“scientific” future for law. Leibniz wrote about the possible use of math- 
ematical models to describe law and philosophy in sufficient detail and 
scope that (in Leibniz’s words), “if controversies were to arise, there 
would be no more need of disputation between two philosophers than 
between two accountants. For it would suffice for them to take their pen- 
cils in their hands and to sit down at the abacus and say to each other 
(with a friend if they wish): Let us calculate.”** It is indeed this branch 
of Leibniz’s thought that interests people, like Michael Livermore, who 
are considering how to put state of the art computing to work on legal 
problems.*° As we have suggested, however, it is Leibniz’s thinking about 
probability, not his speculation that fixed rules might one day answer legal 
questions, that has special resonance in a machine learning age. Leibniz 
thus, arguably, presaged Holmes, both in the application of probability 
theory to law and in the speculation that such application ultimately might 
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be set aside in favor of a universal body of rules. What is more, he may 
have presaged Holmes, too, in thinking past that part of his thinking that 
has real salience to machine learning. 

The irony, then, would be if computing followed Holmes’s descrip- 
tion of law as he thought law is—not his speculations about where he 
thought law was going. Machine learning today finds itself on the path 
that Holmes understood law actually to traverse in his own day. Machine 
learning has shifted computer science from logical deduction to an induc- 
tive process of finding patterns in large bodies of data. Holmes’s realist 
conception of law shifted the law from rules-based formalism to a search 
for patterns in the collected experience of society. Reading Holmes as he 
understood the law to be, not his speculations about where law might 
go, we discern a path of the law that very much resembles that taken by 
machine learning so far. 

Along that path, Holmes supplied a complete description of law. He 
described law as prophecy—meaning that all instances of law, all its 
expressions, are prophecy, and each successive prediction, whatever its 
formal source, in turn shapes, to a greater or to a lesser degree, all the 
prophecies to come. There is thus a self-referential character in law’s 
inputs and outputs. In such self-reference, the law perhaps even antic- 
ipates a way ahead for machine learning: experience supplies the input 
from which present decisions are reached; and, in turn, those outputs 
become the inputs for future decisions. In short, though Holmes might 
have been waiting for technology to inform law, it could turn out that 
it is law that informs technology. The lawyers might have something to 
teach the computer scientists. 


10.3 LESSONS FOR LAWYERS AND OTHER LAYPEOPLE 


Through a reading of Gödel, Turing, and Holmes in Chapter 9, we’ve 
identified a self-referential path that machine learning might follow. 
Regardless of where the technology goes from here, however, it is already 
too important for laypeople, including lawyers, to ignore. Thus we recall 
our initial task: to explain machine learning in terms that convey its essen- 
tials to the non-specialist. 

We have aimed in the chapters above to convey the essentials. We 
have done so with the aid and within the limits of an analogy between 
two revolutions—one in jurisprudence, one in computing. While a much 
wider audience needs to come to grips with machine learning, lawyers, 
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in view of their function in society, find themselves involved in distinctive 
ways in the questions it presents. Coming at machine learning through a 
legal analogy hopefully has established some new connections which will 
help non-specialists in general. The connections likely will have particular 
salience for the lawyers. 

Lawyers whether by inclination or by habit are conservative. Law has 
much to do with authority, and legal argument seldom wins praise for 
conspicuous innovation. Legal minds, moreover, are skeptical; the enthu- 
siasm for new machines that enlivens a technologist is not prevalent 
among lawyers. And, yet, lawyers from time to time have been involved in 
revolutions. As we noted at the start of this exploration of the conceptual 
foundations of machine learning, probability theory—the common cur- 
rent on which the two revolutions addressed in the chapters above have 
been carried—owes much to thinkers who were educated in law. So, too, 
long after, influential ideas in law have come from lawyers whom science 
and technology have interested. 

H.L.A. Hart, having dedicated his Inaugural Lecture in 1952 as Pro- 
fessor of Jurisprudence at Oxford to Definition and Theory in Jurispru- 
dence, a few years later addressed his Holmes Lecture at Harvard to 
the challenge that arises when the legal system is called on to classify 
the sorts of “wonderful things” that solicited Holmes’s interest time and 
again through his career. “Human invention and natural processes,” Hart 
wrote... 


continually throw up such variants on the familiar, and if we are to say 
that these ranges of facts do or do not fall under existing rules, then the 
classifier must make a decision which is not dictated to him, for the facts 
and phenomena to which we fit our words and apply our rules are as it 
were dumb... Fact situations do not await us neatly labeled, creased, and 
folded, nor is their legal classification written on them to be simply read 
off by the judge.37 


The impact of machine learning, realized and anticipated, identifies it as 
a phenomenon that Hart would have recognized as requiring legal classi- 
fication. Lawyers and judges are called upon to address it with what rules 
they already have to hand. New legislation has attempted to address it 
in fresh terms. Explainability, an objective that we considered above, has 
motivated a range of new legislation, such as the GDPR, which entered 
into force in 2018 in the EU.** Enactments elsewhere pursue a similar 
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objective, such as the California Consumer Privacy Act (CCPA) which 
enters into force in 2020,°° as do a great many more.4 It is both too 
early and beyond the scope of the present short book to take stock of the 
legislative output. It is not too early to observe the need for a wider, and 
more intuitive, understanding of what the legislator is being called upon 
to address. Law makers and practitioners need to understand the shift 
from algorithms to machine learning if they are to make good law and to 
practice it effectively. To attempt to label, crease, and fold machine learn- 
ing into a familiar, algorithmic form is a mistake that Hart would have 
cautioned us to avoid. 

Interestingly enough, Turing, so influential a figure in the line of 
human ingenuity that has interested us in the preceding chapters, seems 
to have been not too far removed from Holmes. The proximity was via 
Hart. Held by some the foremost legal philosopher since Holmes, Hart 
called Holmes a “heroic figure in jurisprudence.”*! Hart addressed the 
earlier jurist’s idea of the law in detail, partly in riposte to Holmes’s crit- 
ics.4? Hart did not write about Turing, but they were contemporaries— 
and linked. During World War II, which was before Hart embarked on 
a career as a legal academic, he was assigned to MI5, the British domes- 
tic intelligence agency. Hart’s responsibility was to lead the liaison unit 
between MI5 and project ULTRA, the latter having been under the juris- 
diction of MI6, the external intelligence agency. It was under project 
ULTRA, at a country house at Bletchley Park in England, that Turing did 
his codebreaking and developed the computational strategies that pro- 
vided the point of departure for modern computing. Turing’s work at 
Bletchley Park enabled MI6 to decipher encrypted German communica- 
tions. So closely, however, did MI6 guard ULTRA that it was not clear at 
the start that the liaison unit for which Hart was responsible would serve 
any purpose. It appears that Hart’s personal relations with key people in 
ULTRA played a role in getting the liaison to function—and, thus, in 
helping assure that Turing’s technical achievements would add practical 
value to the war effort.*? An eminent former student and colleague, John 
Finnis, notes that Hart never divulged further details about his wartime 
duties.** Years after the war—but still some time before Turing’s rise to 
general renown—Hart did mention Turing: he mentioned to family that 
he admired him very much.*° 

That Turing’s renown now extends well beyond computer science 
evinces the wider recognition of computing’s importance to modern soci- 
ety. Machine learning, as the branch of computing that now so influences 
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the field, requires a commensurate breadth of understanding. We have 
written here about jurisprudence and the path to AI. Machine learning’s 
impact, however, extends well beyond the legal profession. Every walk of 
life is likely to feel its impact in the years to come. Existing rules might 
help with some of the problems to which the new technology will give 
rise, but lawyers and judges will not find all the answers ready to “read 
off” the existing rules. We hope that having presented the ideas and ways 
of thinking behind machine learning through an analogy with jurispru- 
dence will help lawyers to fold the new technology into the law—and 
will help laypeople fold it into the wider human experience across which 
machine learning’s impact now is felt. 

At the very least, we hope that lawmakers and people at large will stop 
using the word “algorithm” to describe machine learning, and that they 
will ask for “the story behind the training data” rather than “the logic 
behind the decision.” 
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EPILOGUE: LESSONS IN Two DIRECTIONS 


For an analogy to be worthwhile, it should tell one side or the other—the 
lawyer or the computer scientist here—something about the other side’s 
discipline that she (i) does not know or understand; and (ii) wishes to 
know or understand, or should. What have we, the two authors, learnt 
from this analogy? 


A Data SCIENTIST’S VIEW 


This time last year, I knew nothing about Holmes. I was astonished to 
find out that the big debate between statistical inference and machine 
learning, currently being played out in university departments and aca- 
demic journals, was prefigured by a nineteenth century lawyer. It’s 
impressive enough that Holmes argued for pattern finding from expe- 
rience rather than logic, well before the birth of modern statistics in the 
1920s. It’s even more impressive that he took the further leap from sci- 
entific rule-inference to machine-learning style prediction. 

Holmes, when he opened the door to experience and pattern finding 
and prediction, seems not to have been troubled by the implications. It 
seems he had a heady Victorian confidence in science, and he was quite 
sure that he could perfectly well put together a page of history worth as 
much as a volume of logic. But the arguments that Holmes let in when 
he opened that door—arguments about bias, validity, explanation, and so 
on—are still rumbling, even though legal thinking has had 120 years to 
deal with them. 
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Will machine learning still be dealing with these arguments in the year 
2140? At the moment it hasn’t even caught up with some of the sophis- 
ticated ideas that legal thinking has generated so far. And maybe these 
arguments can never be resolved: maybe it’s impossible to hand over 
responsibility to a formal system, and the burden is on every one of us to 
learn how to think better with data and experience. Nevertheless, machine 
learning has a useful brusqueness: ideas that lead to working code are 
taken forwards, other ideas are sooner or later left by the wayside. It also 
has exponentially growing datasets and computing power, which give it 
an ability to step up the abstraction ladder in a way that the law cannot. 
So I am optimistic that machine learning will lead to intellectual advances, 
not just better kitten detectors. 

Holmes would surely find the present day a most exciting time to be 
alive. 


DJW, August 2019 


A LAWYER’S VIEW 


I was in a seminar in the early 1990s at Yale, taught by the then law 
school dean Guido Calabresi, where a fellow student suggested a compar- 
ison between software and statutory text. I don’t recall the detail, except 
that the comparison was rather elaborate, and Calabresi, impatient with 
it because it wasn’t going anywhere, tried to move the conversation on, 
but the student persisted—pausing only to say, “No, wait. I’m on a roll.” 
Seeing an opening, the dean affected a dramatic pose; turned to the rest 
of us; and, over the student who otherwise showed no sign he’d stop, 
pleaded in tremulous tone: “You’re on a roll?” The scene sticks with me 
for its admonitory value: don’t try describing law with computer analo- 
gies. 

In the quarter century since, however, neither law nor computer 
science has left the other alone. The current debates over explain- 
ability, accountability, and transparency of computer outputs point to 
mutual entanglement. Doing something about decisions reached with 
machine learning has entered the legislative agenda, and litigators have 
machine learning and “big data” on their radar. Meanwhile, engineers 
and investors are looking for ways to make machine learning do ever more 
impressive things. The rest of us, lawyers included, in turn, scramble to 
respond to the resultant frictions and figure out, in day to day terms, what 
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machine learning means for us. Could we be missing something basic in 
the urgency of the moment? 

Hearing from people in Cambridge and elsewhere who work on AI 
and machine learning, I had gotten an intuitive sense that it isn’t help- 
ful to talk about computers getting smarter. It didn’t sound fine-grained 
enough to convey what I reckoned an educated layperson ought to know 
about the topic, nor did it sound quite to the point. Accounts from dif- 
ferent specialists over a couple of years added to the picture for me, but 
I couldn’t shake the feeling that there might be something basic missing 
in even the specialists’ appreciation of what they are describing. 

To no conscious purpose having anything to do with machine learn- 
ing, over the holidays in December 2018 I took to re-reading some of 
Holmes’s work, including The Path of the Law. A seeming connection 
roused my curiosity: between Holmes’s idea of experience prevailing over 
logic and machine learning’s reliance on data instead of software code. 
Maybe it was a nice point, but probably not more than that. 

Considering Holmes more closely, however, I started to wonder 
whether there might be something useful in the comparison. The further 
I looked, the more Holmes’s ideas about law seemed to presage prob- 
lems in machine learning, including, as I came to learn, some that aren’t 
widely known. To take a relatively familiar problem, there is the risk to 
societal values when a decision-maker is obscure about how he or she (or 
it) reached a decision. Scholars over the years have noticed in Holmes’s 
work a seeming unconcern about values. Noted at times, but not as often, 
has been Holmes’s concern over how to explain decisions. Holmes res- 
onated as I thought about the current debate over explainability of AI 
outputs. 

Sometimes missed altogether when people read Holmes is the fullness, 
in its forward-leaning, of the idea that law is prophecy. I came to under- 
stand that machine learning isn’t just a better way to write software code; 
it’s a way of re-formulating questions to turn them into prediction prob- 
lems, and, as we’ve argued, the links to Holmes’s idea of prophecy are 
remarkable. Venturing to see where the analogy to Holmes might go, and 
testing it with my co-author, I came to appreciate what machine learning 
does today that’s so remarkable—and, also, what it has not yet done. 

Holmes’s “turn toward induction” is an antecedent to a situation in 
law that lawyers in the United States call a crisis. A turn toward Holmes, 
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however, might be helpful, whatever one’s legal philosophy, for some light 
it casts on machine learning. 
TDG, August 2019 
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